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With  the  increasing  complexity  of  systems  found  in  practical  applications,  the  problem  of  con¬ 
troller  design  is  often  approached  in  a  hierarchical  fashion,  with  discrete  abstractions  and  design 
methods  used  to  satisfy  high  level  task  specifications,  and  continuous  abstractions  and  design  tech¬ 
niques  used  to  satisfy  low  level  control  objectives.  Although  such  a  separation  allows  the  applica¬ 
tion  of  mature  theoretical  and  computational  tools  from  the  realms  of  computer  science  and  control 
theory,  the  task  of  ensuring  desired  closed-loop  behaviors,  which  results  from  the  composition  be¬ 
tween  discrete  and  continuous  designs,  often  requires  costly  and  time  consuming  verification  and 
validation.  This  problem  becomes  especially  acute  in  safety-critical  applications,  in  which  design 
specifications  are  often  subject  to  rigorous  industry  standards  and  government  regulations.  Hybrid 
systems,  which  feature  state  trajectories  evolving  on  a  combination  of  discrete  and  continuous  state 
spaces,  have  been  proposed  as  a  possible  approach  to  reconcile  the  analysis  and  design  techniques 
from  the  discrete  and  continuous  domains  under  a  rigorous  theoretical  framework.  However,  de¬ 
signing  controllers  for  general  classes  of  hybrid  systems  is  a  highly  nontrivial  task,  as  such  a 
design  problem  inherits  both  the  difficulty  of  nonlinear  control,  as  well  as  the  range  of  theoretical 
and  computational  issues  introduced  by  the  consideration  of  discrete  switching. 

This  dissertation  describes  several  efforts  aimed  towards  the  development  of  theoretical  anal¬ 
ysis  tools  and  computational  synthesis  techniques  to  facilitate  the  systematic  design  of  feedback 
control  policies  satisfying  safety  and  target  attainability  specifications  with  respect  to  subclasses 
of  hybrid  system  models.  The  main  types  of  problems  we  consider  are  safety /invariance  problems, 
which  involve  keeping  the  closed-loop  state  trajectory  within  a  safe  set  in  the  hybrid  state  space, 
and  reach-avoid  problems,  which  involve  driving  the  state  trajectory  into  a  target  set  subject  to  a 
safety  constraint.  These  problems  are  addressed  within  the  context  of  continuous  time  switched 
nonlinear  systems  and  discrete  time  stochastic  hybrid  systems,  as  motivated  by  application  scenar¬ 
ios  arising  in  autonomous  vehicle  control  and  air  traffic  management. 

First,  we  provide  several  design  techniques  and  synthesis  algorithms  for  deterministic  reacha¬ 
bility  problems  formulated  in  the  setting  of  switched  nonlinear  systems,  with  controlled  switches 
between  discrete  modes,  and  bounded  continuous  disturbances.  For  scenarios  in  which  the  mode 
transitions  proceed  in  a  known  sequence,  a  method  is  discussed  for  designing  controllers  to  satisfy 
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sequential  reachability  specifications,  consisting  of  a  temporally  ordered  sequence  of  invariance 
and  reach-avoid  objectives.  In  particular,  we  use  continuous  time  reachable  sets  to  inform  choices 
of  feedback  control  policies  within  each  discrete  mode  to  satisfy  both  individual  reachability  objec¬ 
tives  and  compatibility  conditions  between  successive  modes.  This  technique  is  illustrated  through 
an  example  of  maneuver  sequence  design  for  automated  aerial  refueling  of  unmanned  aerial  vehi¬ 
cles.  For  scenarios  in  which  the  modes  of  a  switched  system  can  be  freely  selected,  we  describe 
an  approach  for  the  automated  synthesis  of  feedback  control  policies  achieving  safety  and  reach- 
avoid  objectives,  under  a  sampled  data  setting.  This  synthesis  technique  proceeds  by  a  structured 
reachability  computation  which  retains  information  about  the  choice  of  switching  controls  at  each 
discrete  time  instant,  resulting  in  a  set-valued  policy  represented  in  terms  of  a  finite  collection  of 
reachable  sets.  Experimental  results  from  the  implementation  of  such  control  policies  on  a  quadro- 
tor  platform  to  track  a  moving  ground  target  show  strong  robustness  properties  in  the  presence  of 
significant  disturbances. 

Second,  we  provide  theoretical  and  computational  results  on  stochastic  game  and  partial  in¬ 
formation  formulations  of  probabilistic  reachability  problems.  In  the  setting  of  a  discrete  time 
stochastic  hybrid  game  model,  zero-sum  dynamic  game  formulations  of  probabilistic  safety  and 
reach-avoid  problems  are  considered.  Under  an  asymmetric  information  pattern  favoring  the  adver¬ 
sary,  we  prove  dynamic  programming  results  for  the  computation  of  finite  horizon  max-min  safety 
and  reach-avoid  probabilities  and  synthesis  of  deterministic  max-min  control  policies.  The  im¬ 
plications  of  alternative  information  patterns  and  infinite  horizon  formulations  are  also  discussed. 
In  particular,  it  is  shown  that  under  a  symmetric  information  pattern,  equilibrium  solutions  are  in 
general  found  within  the  class  of  randomized  policies.  The  utility  of  this  approach  is  illustrated 
through  an  example  of  pairwise  aircraft  conflict  resolution,  with  a  probabilistic  model  of  wind  ef¬ 
fects.  In  the  setting  of  a  partially  observable  discrete  time  stochastic  hybrid  system,  we  provide  a 
characterization  of  the  optimal  solution  to  partial  information  probabilistic  safety  and  reach-avoid 
problems,  which  have  nonstandard  multiplicative  and  sum-multiplicative  cost  structures.  In  par¬ 
ticular,  these  problems  are  shown  to  be  equivalent  to  terminal  cost  and  additive  cost  problems,  by 
augmenting  the  hybrid  state  space  with  a  binary  random  variable  capturing  the  safety  of  past  state 
evolution.  Using  this  result,  we  derive  a  sufficient  statistic  in  terms  of  a  set  of  Bayesian  filter¬ 
ing  equations,  along  with  an  abstract  dynamic  programming  algorithm  for  computing  the  optimal 
safety  and  reach-avoid  probabilities.  The  practical  implementation  of  the  estimation  and  control 
algorithms,  however,  will  depend  on  the  existence  of  finite  dimensional  representations  or  approx¬ 
imations  of  the  hybrid  probability  distribution. 
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Chapter  1 


Introduction 

1.1  The  Dichotomy  Between  Discrete  and  Continuous 
Abstractions 

The  task  of  controller  design  for  modern  control  systems  such  as  found  in  aircraft,  automobiles, 
and  industrial  machinery  is  a  highly  complex  undertaking.  This  complexity  results  in  part  from 
the  large  number  of  interacting  system  components,  and  in  part  from  the  wide  range  of  design 
specifications  (e.g.  comfort,  safety,  stability,  efficiency)  that  must  be  satisfied,  often  with  the  ex¬ 
pectation  of  a  high  degree  of  reliability.  A  common  approach  to  controller  design  for  such  systems 
is  a  layered  control  architecture  in  which  successively  coarser  abstractions  are  employed  as  one 
progresses  from  low  level  control  objectives  to  high  level  specifications.  While  low  level  control 
design  is  often  performed  using  continuous  state  models  (e.g.  differential  or  difference  equations) 
and  implemented  using  analog  devices,  high  level  control  design  is  often  performed  using  finite 
state  models  (e.g.  finite  state  machines)  and  implemented  using  embedded  software  and  electronic 
devices.  In  the  case  of  the  former,  one  can  take  advantage  of  the  rich  set  of  design  and  analysis 
methods  that  has  been  developed  in  the  realm  of  control  theory,  while  in  the  case  of  the  latter,  one 
can  take  advantage  of  the  numerous  efficient  algorithms  that  have  been  proposed  in  the  realm  of 
computer  science.  However,  there  is  unfortunately  a  sparsity  of  formal  design  tools  at  the  inter¬ 
face  between  the  two  domains.  In  safety-critical  control  applications,  this  presents  somewhat  of  a 
dilemma,  as  safety  specifications  are  often  described  in  terms  of  the  closed-loop  behavior  of  the 
overall  system,  and  hence  span  the  different  layers  of  the  control  architecture.  In  particular,  the 
satisfaction  of  such  specifications,  which  are  often  determined  by  rigorous  industry  standards  and 
government  regulation,  depends  intimately  on  the  interaction  between  the  discrete  and  continuous 
layers  of  control. 

To  be  more  concrete,  consider  the  example  of  conflict  detection  and  resolution  in  air  traffic 
management.  Under  current  Federal  Aviation  Administration  (FAA)  regulations,  each  aircraft  is 
required  to  maintain  a  minimum  horizontal  and  vertical  separation  distance  from  other  aircraft  in 
the  airspace.  The  problem  of  conflict  detection  is  one  of  predicting  whether  a  loss  of  separation 
will  occur  and  the  problem  of  conflict  resolution  is  one  of  executing  evasive  maneuvers  in  the  event 
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that  a  potential  conflict  is  detected.  These  maneuvers  are  commonly  composed  of  a  discrete  set  of 
basic  aircraft  motions,  for  example,  accelerate,  turn,  descend,  and  ascend. 

It  can  be  observed  that  the  high  level  decisions  of  when  to  initiate  conflict  resolution  maneuvers 
and  how  the  maneuvers  should  be  carried  out  are  both  intimately  related  to  the  continuous  behav¬ 
ior  of  the  aircraft  involved  in  the  conflict  scenario,  in  particular  the  kinematics  of  each  aircraft.  At 
the  same  time,  the  conflict  resolution  problem  is  not  a  purely  continuous  control  problem,  as  the 
execution  of  conflict  resolution  maneuvers  depends  to  a  large  degree  on  the  design  of  high  level 
decision  protocol.  In  particular,  if  the  maneuvers  are  to  be  carried  out  according  to  a  pre-defined 
sequence,  then  the  conflict  resolution  problem  is  one  of  deciding  when  the  aircraft  should  switch 
from  one  motion  to  the  next.  On  the  other  hand,  if  the  aircraft  is  allowed  to  select  freely  from  a 
library  of  basic  motions,  the  problem  then  becomes  one  of  deciding  both  the  sequence  of  motions, 
as  well  as  the  times  at  which  to  perform  the  switch.  If  one  were  to  consider  in  addition  the  vari¬ 
ous  uncertainties  during  the  execution  of  conflict  resolution  maneuvers,  for  example  the  unknown 
intention  of  the  other  aircraft  or  disturbances  to  aircraft  motion  due  to  wind  effects,  it  is  then  no 
longer  sufficient  to  consider  open-loop  choices  of  maneuver  sequences  and  switching  times.  In  this 
case,  the  conflict  resolution  problem  becomes  one  of  designing  a  feedback  policy  for  discrete  ma¬ 
neuver  selection,  such  that  the  continuous  closed-loop  trajectory  of  the  aircraft  maintains  minimum 
separation  distance  at  all  times. 

1.2  High- Confidence  Controller  Design  as  Hybrid 
Reachability  Problem 

The  main  focus  of  this  dissertation  is  on  the  development  of  theoretical  tools  and  computational 
techniques  for  the  design  of  feedback  control  policies  at  the  interface  between  the  discrete  and 
continuous  layers  of  the  control  architecture,  with  the  objective  of  satisfying  certain  functional 
specifications  on  the  closed-loop  system  behavior.  In  particular,  we  will  be  interested  in  functional 
specifications  of  the  following  types:  1)  safety :  keep  the  system  state  within  a  prescribed  safe  set 
in  the  system  state  space  over  finite  or  infinite  time  horizon;  2)  reach-avoid :  drive  the  system  state 
into  a  prescribed  target  set  in  the  system  state  space  within  finite  time,  subject  to  a  constraint  that 
the  state  trajectory  avoids  an  unsafe  set.  These  specifications  are  often  referred  to  in  the  literature 
collectively  as  reachability  specifications.  Given  that  the  controller  design  must  be  conscious  of 
the  discrete  nature  of  high  level  decision  making,  as  well  as  the  continuous  nature  of  the  physical 
system,  a  natural  modeling  framework  is  that  of  a  hybrid  system. 

A  hybrid  system  is  a  dynamical  system  whose  dynamics  evolves  on  a  product  of  discrete  and 
continuous  state  spaces.  The  study  of  such  systems  within  a  formal  mathematical  framework  can 
be  traced  to  the  seminal  work  of  Witsenhausen  (1966).  By  now,  there  is  a  well-developed  body 
of  literature  devoted  to  the  modeling  and  analysis  of  hybrid  systems  (see  for  example  Gollu  and 
Varaiya,  1989;  Brockett,  1993;  Alur  et  al.,  1993;  Antsaklis  et  al.,  1993;  Nerode  and  Kohn,  1993; 
Caines  and  Wei,  1998;  Branicky  et  al.,  1998;  Hu  et  al.,  2000).  These  models  have  been  employed  in 
the  study  of  application  scenarios  ranging  from  air  traffic  management  (Sastry  et  al.,  1995;  Tomlin 
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et  al.,  2002),  automotive  control  (Balluchi  et  al.,  2000),  systems  biology  (Ghosh  and  Tomlin,  2004; 
Lincoln  and  Tiwari,  2004),  to  unmanned  aerial  vehicles  (Frazzoli  et  al.,  2000;  Koo  et  al.,  2001). 
In  certain  cases,  they  are  used  to  capture  the  interaction  between  discrete  and  analog  components 
of  the  physical  system,  for  example  the  use  of  switching  elements  in  the  control  of  electrical  and 
mechanical  systems  (Aimer  et  al.,  2007).  In  other  cases,  they  are  used  to  capture  sharp  changes  in 
the  continuous  behavior  of  a  dynamical  system,  for  example  the  operation  modes  of  an  automotive 
engine  (Balluchi  et  al.,  2000),  or  the  phases  of  bipedal  walking  (Ames  et  al.,  2009).  Finally,  perhaps 
most  relevant  for  our  discussions,  they  have  been  used  as  a  mathematical  formalism  to  integrate 
discrete  and  continuous  abstractions  in  a  hierarchical  control  architecture  (Gollu  and  Varaiya,  1989; 
Lygeros,  1996;  Caines  and  Wei,  1998;  Alur  et  al.,  2001). 

For  the  purposes  of  controller  design,  a  hybrid  system  model  provides  us  with  an  abstraction  of 
the  interactions  between  the  high  level  and  low  level  control  layers.  In  particular,  the  mechanisms 
for  high  level  decision  making  can  be  abstracted  in  terms  of  a  discrete  transition  system,  while 
continuous  behaviors  resulting  from  high  level  commands  can  be  abstracted  in  terms  of  continu¬ 
ous  state  models  associated  with  each  of  the  discrete  modes.  The  interactions  between  the  control 
layers  is  then  captured  through  the  possible  dependence  of  discrete  transitions  on  continuous  state 
variables,  as  well  as  the  possible  dependence  of  continuous  dynamics  on  discrete  state  variables. 
The  various  sources  of  uncertainty,  such  as  unmodelled  system  dynamics  or  environment  distur¬ 
bances,  can  be  included  as  exogenous  inputs  or  stochastic  noise  affecting  the  discrete  or  continuous 
state  evolution. 

Within  the  context  of  a  hybrid  system  model,  the  problem  of  designing  control  policies  to 
satisfy  safety  or  reach-avoid  control  objectives  can  be  elegantly  posed  as  a  reachability  problem 
on  the  hybrid  state  space,  through  proper  interpretations  of  the  specifications  as  constraints  on  the 
discrete  and  continuous  states.  In  the  event  that  the  relevant  behaviors  of  the  underlying  control 
system  are  accurately  described  in  terms  of  the  hybrid  system  model,  it  can  be  then  expected  with  a 
high  degree  of  confidence  that  the  solution  to  a  hybrid  reachability  problem  will  satisfy  the  desired 
control  objectives  on  the  actual  system.  However,  given  the  inevitable  deviations  between  the 
complex  behaviors  of  the  actual  system  and  the  mathematical  properties  of  an  abstraction  that  is 
tractable  for  analysis  and  control,  one  should  not  expect  that  this  will  completely  eliminate  the  need 
for  formal  verification  and  validation.  Instead,  what  can  be  hoped  for  is  that  through  a  principled 
controller  design  approach  based  upon  mathematical  models  rather  than  heuristic  insights,  one  can 
reduce  the  prohibitive  amount  of  time  and  effort  that  are  currently  expended  on  the  verification  and 
validation  of  safety-critical  control  systems. 

1.3  Computational  Solutions  to  Reachability  Problems 

From  the  point  of  view  of  control  theory,  it  has  been  recognized  that  hybrid  reachability  problems 
are  equivalent  to  optimal  control  or  dynamic  game  problems,  with  real-valued  cost  functions  de¬ 
fined  on  the  hybrid  state  space  (see  for  example  Asarin  et  al.,  1995;  Lygeros  et  al.,  1 999b:  Tomlin 
et  al.,  2000;  Koutsoukos  and  Riley,  2006;  Amin  et  al.,  2006;  Mohajerin  Esfahani  et  al.,  2011).  In 
fact,  this  equivalence  is  not  particular  to  hybrid  systems,  but  is  rather  a  fundamental  characteristic 
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of  reachability  problems.  To  illustrate  this  point,  consider  a  control  system  of  the  formi  =  f(x.  u), 
x ( 0 )  =  A'o  e  Wl.  With  appropriate  assumptions  on  the  vector  field  /,  the  solution  trajectory  x(-)  of 
this  system  is  uniquely  determined  by  the  initial  condition.ro,  and  the  choice  of  controls  w  ( • ) .  Given 
a  safe  set  W  cl",  we  can  use  the  procedures  described  in  Lygeros  et  al.  (19991?)  to  define  a  cost 
function  J(x o,  u),  such  that  7  =  1  if  the  trajectory  corresponding  to  (xo,  u)  satisfies  x(l  )  e  W  for  ev¬ 
ery  t  over  the  time  horizon  of  interest,  and  7  =  0  otherwise.  In  other  words,  7  =  1  if  and  only  if  the 
safety  specification  is  satisfied.  Now  consider  the  optimal  control  problem  7*  (xq)  —  max„7(.ro,  «)• 
Then  verifying  the  safety  property  consists  of  computing  the  value  7*  and  finding  the  set  of  initial 
conditions  such  that  7*  =  1,  while  controller  design  consists  of  finding  a  maximizer  for  each  of 
these  initial  conditions. 

The  advantage  of  this  viewpoint  is  that  finding  computational  algorithms  solving  hybrid  reach¬ 
ability  problems  becomes  equivalent  to  finding  computational  algorithms  solving  optimal  control 
problems  (in  the  case  of  a  single  control  agent)  and  dynamic  game  problems  (in  the  case  of  a 
control  and  a  disturbance).  This  allows  a  control  engineer  to  tap  into  the  wealth  of  knowledge 
and  insight  that  has  accumulated  in  the  respective  fields  of  optimal  control  and  dynamic  games. 
It  then  comes  as  little  surprise  that  a  significant  number  of  computational  algorithms  that  have 
been  proposed  for  deterministic  or  probabilistic  reachability,  especially  in  the  context  of  systems 
with  continuous  states,  are  based  upon  either  the  dynamic  programming  principle  or  the  maximum 
principle  (see  for  example  Kurzhanski  and  Varaiya,  2000;  Criick  and  Saint-Pierre,  2004;  Mitchell 
et  al.,  2005;  Hwang  et  al.,  2005;  Koutsoukos  and  Riley,  2006;  Abate  et  al.,  2007).  In  particular,  the 
computational  algorithms  for  controller  synthesis  discussed  in  this  dissertation  are  primarily  based 
upon  the  dynamic  programming  principle. 

Despite  the  mathematical  elegance  of  an  optimal  control  or  dynamic  game  formulation  of  the 
hybrid  reachability  problem,  solving  such  a  problem  for  general  hybrid  systems  is  a  significant 
challenge  (Branicky  et  al.,  1998).  In  particular,  hybrid  optimal  control  problems  feature  both 
the  difficulties  of  nonlinear  optimal  control,  as  well  as  the  range  of  issues  introduced  by  discrete 
switching  between  modes  of  a  hybrid  system  (this  can  include  discontinuities  in  the  vector  field, 
reset  of  the  continuous  state,  or  even  changes  in  the  continuous  state  dimension  due  to  algebraic 
constraints).  This  then  motivates  the  study  of  subclasses  of  hybrid  systems  for  which  approximate 
solutions  to  reachability  problems  can  be  obtained. 

1.4  Consideration  of  Subclasses  of  Hybrid  Systems 

In  this  dissertation,  we  will  be  specifically  interested  in  controller  design  methods  for  the  classes 
of  continuous  time  switched  nonlinear  systems  (Part  I)  and  discrete  time  stochastic  hybrid  systems 
(Part  II).  The  motivations  for  studying  these  subclasses  of  hybrid  systems  are  discussed  below. 

A  switched  nonlinear  system  is  a  hybrid  system  whose  continuous  dynamics  switches  among 
a  finite  collection  of  continuous  vector  fields  according  to  a  discrete  transition  rule  (Liberzon  and 
Morse,  1999).  The  salient  characteristics  of  such  a  system  are:  1)  the  continuous  state  dimension 
is  the  same  across  the  discrete  modes;  2)  the  continuous  state  does  not  reset  as  a  discrete  transition 
is  made.  However,  the  vector  field  is  in  general  discontinuous  across  a  discrete  transition.  For 
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purposes  of  controller  synthesis,  we  will  consider  the  class  of  switched  systems  with  controlled 
switching  among  the  set  of  discrete  modes.  Within  each  discrete  mode,  the  vector  field  can  be  non¬ 
linear  and  subject  to  continuous  disturbances  with  bounded  magnitude.  The  underlying  modeling 
assumption  is  that  the  condition  for  switching  between  the  discrete  modes  is  a  design  parameter, 
rather  than  an  inherent  characteristic  of  the  physical  system.  This  removes  a  number  of  technical 
difficulties  associated  with  the  analysis  and  control  of  continuous  time  hybrid  systems,  at  the  cost 
of  restricting  system  dynamics  to  those  without  autonomous  switches.  However,  a  switched  system 
model  provides  a  fitting  abstraction  of  a  physical  system  whose  underlying  continuous  behavior 
can  be  described  in  terms  of  a  nonlinear  vector  field  (up  to  bounded  disturbances),  but  whose  high 
level  control  is  performed  by  discrete  switching  among  a  finite  set  of  low  level  controllers.  Exam¬ 
ples  of  such  systems  can  be  found  in  aircraft  conflict  resolution  (Tomlin  et  al.,  2002),  unmanned 
aerial  vehicle  trajectory  control  (Frazzoli  et  al.,  2000),  and  robot  formation  control  (Fierro  et  al., 
2001). 

As  compared  with  deterministic  hybrid  systems,  which  takes  switched  nonlinear  systems  as  a 
special  case,  stochastic  hybrid  system  models  allow  for  a  probabilistic  description  of  the  uncer¬ 
tainties  affecting  system  dynamics  (Hu  et  al.,  2000).  This  description  can  be  obtained  for  example 
from  a  statistical  analysis  of  past  data  on  system  behavior.  Such  models  have  been  used  to  study 
control  problems  arising  in  air  traffic  management  (Glover  and  Fygeros,  2004),  communication 
networks  (Hespanha,  2004),  and  systems  biology  (Hu  et  al.,  2004).  In  the  case  of  a  discrete  time 
stochastic  hybrid  system,  the  evolution  of  the  system  state  over  discrete  time  instants  is  assumed 
to  be  described  by  a  transition  probability  over  the  hybrid  state  space,  parameterized  by  the  current 
system  state  and  control  inputs  (Abate  et  al.,  2008).  This  results  in  a  discrete  time  Markov  process, 
for  which  a  rich  body  of  theory  has  been  built  up  in  the  study  of  stochastic  optimal  control  (see 
for  example  Bertsekas  and  Shreve,  1978;  Kumar  and  Varaiya,  1986).  Under  a  discrete  time  model, 
there  is  no  longer  the  notion  of  “continuity  in  time.”  As  a  result,  one  can  account  for  discrete 
transitions  which  are  dependent  on  the  continuous  state,  possible  changes  in  continuous  state  di¬ 
mensions  across  discrete  transitions,  as  well  as  resets  in  the  continuous  state,  without  significant 
technical  difficulties.  However,  this  level  of  generality  comes  at  the  cost  of  abstracting  away  the 
possibly  rich  set  of  hybrid  system  behaviors  in  between  the  discrete  time  instants. 

As  fields  of  study,  the  reachability  of  deterministic  hybrid  systems  and  stochastic  hybrid  sys¬ 
tems  are  currently  at  two  significantly  different  stages  of  development.  Due  to  the  large  body  of 
previous  work  which  has  focused  on  the  modeling  and  analysis  of  deterministic  hybrid  systems, 
the  former  has  by  now  a  significant  number  of  methods  and  algorithms  for  the  computation  of 
approximate  reachable  sets  for  a  wide  range  of  hybrid  systems,  with  well- understood  theoretical 
and  numerical  properties  (see  for  example  Asarin  et  al.,  2000c/;  Kurzhanski  and  Varaiya,  2000;  Be- 
mporad  et  al.,  2000 b\  Aubin  et  al.,  2002;  Chutinan  and  Krogh,  2003;  Mitchell  et  al.,  2005;  Girard, 
2005).  On  the  other  hand,  the  latter  is  still  at  a  stage  in  which  methods  for  formulating  probabilis¬ 
tic  reachability  problems  are  in  the  process  of  being  proposed  (Koutsoukos  and  Riley,  2006;  Amin 
et  al.,  2006),  algorithms  for  computing  approximations  to  the  reachability  probability  are  in  the 
process  of  being  devised  (Hu  et  al.,  2005;  Abate  et  al.,  2008),  and  their  theoretical  and  numerical 
properties  are  in  the  process  of  being  analyzed  (Abate  et  al.,  2010). 

Given  the  gap  between  current  understanding  of  deterministic  and  probabilistic  reachability  of 
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hybrid  systems,  the  discussions  in  Part  I  and  Part  II  of  this  dissertation  will  correspondingly  dif¬ 
fer  in  their  focus  with  respect  to  aspects  of  the  controller  design  problem.  More  specifically,  our 
discussion  of  reachability  problems  for  switched  nonlinear  systems  will  focus  on  the  derivation  of 
concrete  design  procedures  and  synthesis  algorithms  for  generating  feedback  controllers  that  can 
be  implemented  in  practical  applications,  based  upon  existing  techniques  for  computing  reachable 
sets.  In  comparison,  our  discussion  of  reachability  problems  for  stochastic  hybrid  systems  will  be 
somewhat  more  abstract,  and  focus  instead  on  the  formulation  of  probabilistic  reachability  prob¬ 
lems  under  various  models  of  uncertainty,  the  construction  of  dynamic  programming  algorithms  to 
solve  these  problems,  and  a  foundational  understanding  of  the  theoretical  properties  and  practical 
implications  of  the  dynamic  programming  solution. 

1.5  Organization 

This  dissertation  covers  several  controller  design  methods  for  reachability  problems  that  arise  in 
the  context  of  switched  nonlinear  systems  and  discrete  time  stochastic  hybrid  systems.  Parts  of 
the  material  presented  here  have  appeared  previously  in  several  papers:  Ding  et  al.  (2008);  Ding 
and  Tomlin  (2010);  Ding  et  al.  (2011  b,a);  Kamgarpour  et  al.  (2011);  Ding  et  al.  (2012).  In  the 
following,  we  provide  an  overview  of  the  main  themes  from  each  of  the  subsequent  chapters. 

In  chapter  2,  we  consider  switched  nonlinear  systems  whose  discrete  states  represent  the  se¬ 
quential  phases  of  a  dynamic  process.  Within  this  context,  a  systematic  procedure  is  proposed, 
based  upon  a  hybrid  system  formalism,  for  carrying  out  controller  design  to  satisfy  sequential 
reachability  specifications,  namely  specifications  consisting  of  a  sequence  of  safety  and  reach- 
avoid  objectives.  This  is  motivated  by  maneuver  sequence  design  problems  for  unmanned  aerial 
vehicles  (UAVs)  requiring  robust  operation  guarantees.  Through  an  appropriate  choice  of  switch¬ 
ing  policy,  the  problem  is  posed  as  one  of  continuous  control  design  for  each  discrete  mode  to 
ensure  both  individual  reachability  objectives,  as  well  as  proper  composition  between  successive 
modes  in  the  sequence.  This  design  task  is  addressed  using  computational  tools  from  nonlinear 
reachability  analysis.  The  proposed  methodology  is  illustrated  through  the  example  of  automated 
aerial  refueling  (AAR). 

In  chapter  3,  we  shift  the  discussion  to  switched  systems  whose  discrete  states  represent  the 
set  of  qualitative  control  choices  available  to  a  high  level  controller.  For  this  class  of  systems, 
controller  synthesis  algorithms  are  proposed  for  computing  feedback  control  policies  satisfying 
safety  and  reach- avoid  specifications  under  worst-case  disturbance  realizations.  For  practical  pur¬ 
poses,  the  problem  is  posed  in  a  sampled-data  setting  in  which  measurements  of  the  system  state 
are  obtained  at  regular  sampling  instants.  The  controller  synthesis  algorithms  proceed  by  itera¬ 
tive  reachability  calculations  over  sampling  intervals,  returning  as  output  a  collection  of  reachable 
sets  representing  the  control  policy.  These  reachable  sets  can  be  then  stored  as  lookup  tables  for 
online  computation  of  control  inputs  in  a  sampled  data  setting.  This  methodology  is  applied  to  a 
simulation  example  of  aircraft  conflict  resolution,  as  well  as  an  experimental  example  of  quadrotor 
hover  control.  The  AAR  example  is  also  revisited  to  illustrate  how  the  controller  synthesis  pro¬ 
cedures  can  be  applied  to  the  design  of  switching  controllers  for  individual  phases  of  a  sequential 
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reachability  problem. 

In  chapter  4,  we  describe  a  framework  for  analyzing  probabilistic  reachability  problems  for 
discrete  time  stochastic  hybrid  systems  (DTSHS)  within  a  dynamic  games  setting.  In  particu¬ 
lar,  we  consider  zero-sum  stochastic  game  formulations  of  the  safety  and  reach-avoid  problems, 
and  discuss  dynamic  programming  algorithms  for  computing  the  optimal  probability  of  satisfying 
the  reachability  specifications,  subject  to  the  worst-case  behavior  of  a  rational  adversary.  This  is 
motivated  by  instances  of  hybrid  system  models  which  feature  a  combination  of  stochastic  and 
adversarial  uncertainties.  The  problem  is  first  posed  in  the  finite  horizon  case,  assuming  an  asym¬ 
metric  information  pattern  favoring  the  adversary  (as  motivated  by  robust  control  problems).  The 
implications  of  considering  infinite  horizon  problems,  as  well  as  stochastic  game  formulations  with 
symmetric  information  patterns  are  discussed  in  subsequent  sections.  In  particular,  the  existence  of 
a  value  in  a  symmetric  stochastic  game  in  general  requires  the  consideration  of  randomized  control 
policies. 

In  chapter  5,  we  focus  our  attention  on  the  issue  of  partial  observability  in  probabilistic  reach¬ 
ability  problems.  We  proceed  by  formulating  the  safety  and  reach-avoid  problems  for  DTSHS 
as  stochastic  optimal  control  problems  under  partial  observation,  and  show  that  even  though  they 
feature  a  multiplicative  cost  structure,  they  are  equivalent  to  additive  cost  problems  when  the  state 
space  is  augmented  with  an  auxiliary  binary  random  variable.  This  allows  us  to  derive  a  suffi¬ 
cient  statistic  for  probabilistic  reachability  problems  as  a  probability  distribution  evolving  on  the 
augmented  state  space,  as  well  as  an  abstract  dynamic  programming  algorithm  for  computing  the 
optimal  probability  of  satisfying  the  reachability  specifications.  Issues  of  computation  and  imple¬ 
mentation  are  discussed  in  terms  of  the  special  cases  of  finite  state  Markov  decision  processes  and 
hybrid  state  models  with  probability  density  descriptions.  In  particular,  practical  implementation 
of  the  control  and  estimation  algorithms  hinges  on  efficient  representations  of  the  augmented  prob¬ 
ability  distribution,  which  suggests  the  need  for  a  deeper  understanding  of  hybrid  state  estimation. 

In  chapter  6,  we  close  with  a  summary  of  the  main  results  presented  in  the  dissertation,  as 
well  as  some  thoughts  on  directions  on  future  work.  These  directions  include  investigations  into 
computationally  efficient  methods  for  deterministic  and  probabilistic  reachability,  extensions  of 
the  controller  synthesis  methods  for  switched  systems  to  handle  autonomous  switching,  and  the 
consideration  of  multi-objective  problems  and  temporal  objectives  in  reachability  specifications. 
Most  of  the  technical  results  and  proofs  in  this  dissertation  are  embedded  within  the  main  text,  as 
they  sometimes  provide  insight  into  particular  aspects  of  the  controller  design  procedure.  However, 
some  of  the  lengthy  proofs  related  to  measurability  issues  in  Part  II  of  the  dissertation  can  be  found 
in  the  appendices. 
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Part  I 

Switched  Nonlinear  Systems 
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Chapter  2 

Design  Procedure  for  Sequential 
Reachability  Specifications 

2.1  Motivation  and  Overview  of  Design  Methodology 

This  chapter  discusses  a  controller  design  procedure  for  sequential  reachability  specifications  in  the 
context  of  switched  nonlinear  systems.  In  particular,  we  consider  switched  systems  whose  discrete 
mode  transitions  follow  a  predefined  sequence,  and  within  each  discrete  mode,  the  objective  is 
to  satisfy  either  a  safety  or  reach-avoid  specification  defined  in  terms  of  the  continuous  system 
trajectory,  possibly  subject  to  bounded  continuous  disturbances.  The  sequential  structure  of  the 
system  model  provides  an  abstraction  for  dynamic  processes  whose  flow  follows  a  temporally- 
ordered  sequence  of  qualitative  phases,  for  example  certain  manufacturing  or  chemical  processes. 
The  reachability  specification  is  then  a  description  of  the  control  objective  in  each  phase  of  the 
process  flow.  Our  primary  motivation  for  studying  such  problems  comes  from  automation  of  flight 
maneuver  sequences  for  unmanned  aerial  vehicles  (UAVs). 

2.1.1  Automation  of  Flight  Maneuver  Sequences 

In  modern  autonomous  flight  systems,  the  tasks  of  management  and  control  of  aircraft  are  fre¬ 
quently  distributed  between  an  onboard  autonomous  controller  and  external  human  operators  or 
supervisors.  In  safety-critical  scenarios,  high  level  decisions  on  how  and  when  flight  maneuvers 
should  be  carried  out  currently  rest  almost  exclusively  with  trained  human  operators,  while  the  task 
of  ensuring  low  level  specifications  such  as  flight  envelope  protection  is  delegated  to  the  onboard 
flight  control  system  (Yavrucuk  et  al.,  2009).  However,  as  one  pushes  towards  increased  levels  of 
autonomy  for  UAV  operation,  it  becomes  important  to  investigate  methods  for  incorporating  some 
of  the  high  level  decision  making  capabilities  into  the  onboard  UAV  control  system. 

In  this  chapter,  we  will  restrict  our  attention  to  high  level  specifications  consisting  of  an  or¬ 
dered  sequence  of  waypoints  that  the  UAV  must  reach,  while  satisfying  a  safety  constraint,  for 
example  avoiding  a  collision  with  another  aircraft.  Following  this  specification,  one  can  separate 
the  control  task  into  a  sequence  of  qualitative  phases,  with  each  phase  corresponding  to  a  flight  ma- 
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neuver  in  which  the  objective  is  to  either  reach  a  target  waypoint  in  finite  time  while  satisfying  a 
safety  constraint  (i.e.  reach-avoid  problem),  or  to  loiter  within  a  neighborhood  of  the  waypoint  (i.e. 
safety/invariance  problem).  The  transitions  between  maneuvers  can  be  controlled  by  human  oper¬ 
ators  or  initiated  autonomously  by  the  UAV.  This  then  results  in  a  sequential  reachability  problem. 
In  the  following,  we  discuss  a  practical  example  of  such  type  of  specifications. 

The  scenario  is  that  of  Automated  Aerial  Refueling  (AAR).  Currently,  manned  military  aircraft 
which  undergo  long  range  missions  are  routinely  refueled  in  mid-air  by  tanker  aircraft.  As  the  use 
of  UAVs  becomes  increasingly  prevalent,  there  is  an  ongoing  effort  to  introduce  this  capability 
into  UAV  operations,  ideally  with  a  minimal  amount  of  supervision  by  human  operators  (see  for 
example  Valasek  et  al.,  2002;  Nalepka  and  Hinchman,  2005;  Jin  et  al.,  2006;  Ross  et  al.,  2006).  A 
conceptual  illustration  of  the  AAR  scenario  is  shown  in  Figure  2.1. 


Figure  2.1:  Conceptual  illustration  of  automated  aerial  refueling  scenario. 

During  a  refueling  operation,  a  UAV  detaches  from  its  formation,  and  approaches  the  rear  of 
a  tanker  aircraft  for  refueling.  The  boom  operator  onboard  the  tanker  then  lowers  a  fuel  boom  to 
refuel  the  UAV;  once  the  refueling  is  complete,  the  operator  disconnects  the  boom  and  the  UAV 
breaks  away  from  the  tanker  to  rejoin  its  formation.  This  description  naturally  decomposes  AAR 
into  several  distinct  phases,  namely  an  “approach  tanker”  phase,  a  “refueling”  phase,  and  a  “re¬ 
join  formation”  phase.  To  introduce  further  structure  into  the  refueling  operation,  the  approach 
and  rejoin  phases  can  be  separated  into  a  sequence  of  flight  maneuvers  in  which  the  objective  is 
to  reach  some  target  location  relative  to  the  tanker  aircraft,  while  avoiding  collisions.  The  AAR 
scenario  can  be  then  modeled  by  a  switched  system  whose  discrete  dynamics  consist  of  the  sequen¬ 
tial  transitions  through  the  flight  maneuvers,  and  the  continuous  dynamics  consist  of  the  relative 
kinematics  between  the  tanker  and  the  UAV  in  executing  the  respective  maneuvers.  The  controller 
design  problem  for  this  scenario  can  be  then  posed  as  a  sequential  reachability  problem  in  which 
the  objective  of  each  maneuver  is  to  reach  a  waypoint  location  relative  to  the  tanker  aircraft,  while 
avoiding  a  collision,  namely  a  reach-avoid  objective.  If  one  were  to  consider  possible  environment 
disturbances  such  as  wind  effects  or  variations  in  tanker  speed,  then  this  specification  would  need 
to  be  satisfied  subject  to  the  worst-case  realizations  of  the  disturbances. 
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The  scenario  described  above  is  in  large  part  that  of  an  autonomous  AAR  procedure.  Namely, 
except  for  the  initiation  of  the  refueling  operation  and  the  actual  refueling  of  the  UAV  by  the 
boom  operator,  the  execution  of  the  maneuver  sequence  does  not  require  any  additional  human 
intervention.  However,  there  may  be  cases  in  which  either  the  UAV  operator  or  boom  operator 
would  like  to  have  an  input  on  when  the  UAV  is  allowed  to  transition  from  one  maneuver  to  the 
next.  This  could  arise  for  example  from  the  need  to  interrupt  the  refueling  sequence  due  to  severe 
wind  turbulences  beyond  those  considered  for  the  autonomous  AAR  design,  or  to  have  the  UAV 
dwell  within  the  vicinity  of  the  fuel  boom  while  the  boom  operator  refuels  the  UAV.  For  such  cases, 
one  can  insert  intermediate  maneuvers  into  the  refueling  sequence  with  the  objective  of  keeping 
the  UAV  in  a  neighborhood  of  each  waypoint  while  awaiting  operator  confirmation  to  perform  the 
next  maneuver.  The  resulting  reachability  problem  then  consists  of  a  sequence  of  reach-avoid  and 
safety/invariance  objectives. 

The  development  of  the  AAR  scenario  for  this  research  effort  was  carried  out  in  conjunction 
with  Boeing  Research  &  Technology,  and  the  Air  Force  Research  Laboratory  (AFRL),  through 
the  Certification  Technologies  for  Flight  Critical  Systems  (CerTA  FCS)  project.  The  author  would 
like  to  gratefully  acknowledge  the  contributions  of  Jim  Barhorst,  Jim  Paunicka,  and  Doug  Stuart 
of  Boeing  Research  &  Technology  for  their  valuable  feedback  and  suggestions  in  formulating  the 
AAR  scenario.  Also,  many  of  the  research  ideas  which  led  to  the  work  described  in  this  chap¬ 
ter  were  conceived  in  David  Homan’s  yearly  meetings  on  Verification  and  Validation  at  Wright- 
Patterson  Air  Force  Base  in  Dayton,  OH.  In  these  meetings,  our  effort  was  influenced  by  the 
conversations  and  presentations  of  many  of  the  participants,  and  the  author  is  grateful  for  their 
contributions. 

2.1.2  Methodology  Overview 

Our  approach  to  the  sequential  reachability  problem  is  to  pose  it  as  a  hybrid  system  design  problem. 
Within  this  framework,  the  design  parameters  include  the  switching  conditions  for  the  sequence  of 
discrete  modes,  as  well  as  the  continuous  control  law  within  each  of  the  discrete  modes.  As  will  be 
discussed,  through  a  judicious  choice  of  switching  condition,  one  can  isolate  the  problem  to  one  of 
continuous  control  design  for  the  individual  discrete  modes.  In  particular,  the  continuous  control 
law  in  each  mode  needs  to  satisfy 

1.  the  reachability  specification  given  for  that  mode; 

2.  a  compatibility  condition  to  ensure  that  the  sequence  of  modes  can  be  properly  composed. 

It  turns  out  that  satisfying  these  requirements  can  be  formulated  as  a  continuous  time  reachabil¬ 
ity  problem.  As  such,  we  can  use  computational  reachability  analysis  for  continuous  time  systems 
as  a  design  tool  for  the  continuous  control  laws.  This  allows  us  to  check  whether  a  given  control 
law  satisfies  the  desired  reachability  specifications  without  the  need  to  resort  to  exhaustive  simu¬ 
lation  studies.  Due  to  our  consideration  of  nonlinear  continuous  dynamics  subject  to  continuous 
disturbance,  the  reachability  calculations  will  be  carried  out  using  a  method  based  upon  numerical 
solutions  of  Hamilton-Jacobi  partial  differential  equations  (PDEs)  (Mitchell  et  al.,  2005). 
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To  discuss  this  in  a  more  concrete  setting,  consider  again  the  aerial  refueling  scenario.  In 
the  case  that  the  sequence  of  refueling  maneuvers  is  to  be  performed  autonomously,  a  reasonable 
choice  of  switching  conditions  is  to  specify  that  as  soon  as  the  UAV  reach  a  given  waypoint,  it 
will  immediately  transition  to  the  next  maneuver.  A  computational  reachability  analysis  can  then 
be  performed  for  each  flight  maneuver  in  the  refueling  sequence  to  determine  1)  the  capture  set : 
the  set  of  aircraft  states  from  which  a  maneuver  can  be  completed  within  a  finite  time  horizon; 
and  2)  the  collision  set:  the  set  of  aircraft  states  from  which  the  trajectory  of  a  flight  maneuver 
passes  through  a  collision  zone  centered  on  the  tanker  aircraft.  At  design  time,  the  capture  sets  and 
collision  sets  computed  for  the  various  maneuvers  in  the  AAR  sequence  can  be  used  to  guide  the 
choice  of  maneuver  control  laws  so  as  to  ensure  that  each  maneuver  terminate  in  an  aircraft  state 
which  satisfies  the  reach-avoid  objective  of  the  next  maneuver  (thus  allowing  the  next  maneuver  to 
be  feasibly  initiated).  Furthermore,  through  appropriate  modifications  of  the  reachability  analysis, 
the  effects  of  bounded  environment  disturbances  can  also  be  taken  into  account.  However,  in  such 
cases,  the  resulting  design  of  maneuver  control  laws  is  in  general  more  conservative  than  the  case 
in  which  the  robustness  factors  are  not  considered. 

2.1.3  Organization 

The  organization  of  this  chapter  is  as  follows.  Section  2.2  provides  an  overview  of  related  work 
in  the  domain  of  formal  verification  and  mode  sequence  design.  Section  2.3  discusses  a  hy¬ 
brid  formalism  for  the  class  of  switched  systems  under  consideration.  Section  2.4  provides  for¬ 
mal  statements  of  two  types  of  sequential  reachability  problems.  Section  2.5  briefly  reviews  the 
method  of  Hamilton- Jacobi  reachability  for  nonlinear  continuous  systems.  Section  2.6  introduces 
a  reachability-based  procedure  for  performing  controller  design  to  satisfy  sequential  reachability 
specifications.  Section  2.7  discusses  the  use  of  reachable  sets  as  an  aid  to  human  decision  mak¬ 
ing  in  recovering  from  a  class  of  fault  conditions  occurring  during  run-time,  in  particular  that  of 
improper  initialization.  These  methods  are  then  specialized  to  the  particular  case  of  automated 
aerial  refueling,  and  the  results  of  the  controller  design  procedure  along  with  simulated  scenarios 
are  presented  in  section  2.8. 

2.2  Related  Work 

2.2.1  Hamilton- Jacobi  Reachability 

The  method  of  Hamilton- Jacobi  (H-J)  reachability  is  developed  for  computing  reachable  sets  for 
continuous  time  nonlinear  system,  under  a  dynamic  games  framework  (Mitchell  et  al.,  2005).  In 
the  work  by  Tomlin  et  al.  (2003),  one  can  find  a  comprehensive  overview  of  the  computational 
techniques  underlying  the  H-J  reachability,  its  use  in  analyzing  and  verifying  continuous  time 
nonlinear  systems  as  well  as  hybrid  systems. 

This  method  has  seen  successes  in  numerous  aeronautical  applications.  In  the  work  by  Mitchell 
et  al.  (2005),  the  authors  present  a  method  for  detecting  possible  “loss  of  separation”  between  pairs 
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of  aircraft  over  a  given  airspace,  based  upon  backward  reachable  sets  computed  using  H-J  PDEs. 
Using  this  formulation  of  the  collision  avoidance  problem,  the  reachable  set  method  has  been  used 
to  verify  safety  of  conflict  resolution  aircraft  maneuvers  (Tomlin  et  al.,  2001),  and  closely  spaced 
parallel  approaches  for  airport  runways  (Teo  and  Tomlin,  2003).  The  results  of  the  reachability 
calculations  were  validated  in  extensive  simulations  as  well  as  UAV  flight  experiments  (Jang  and 
Tomlin,  2005;  Teo,  2005).  While  the  focus  of  these  previous  applications  lies  largely  in  safety 
verification,  the  work  described  in  this  chapter  proposes  a  method  for  using  reachability  analysis 
as  a  design  tool  for  choosing  the  continuous  control  laws  of  a  maneuver  sequence  so  as  to  satisfy 
the  desired  specifications. 

In  systems  that  involve  human-automation  interactions,  H-J  reachability  has  also  been  success¬ 
fully  demonstrated  as  a  method  for  informing  human  decisions.  In  the  work  by  Oishi  et  al.  (2002), 
the  authors  use  reachability  analysis  to  determine  whether  the  pilot  display  of  a  civil  jet  aircraft 
contained  enough  information  for  the  pilot  to  safely  perform  a  Take-off/Go-Around  (TO/GA)  ma¬ 
neuver  from  a  Flare  landing  maneuver.  In  another  example,  as  described  in  Sprinkle  et  al.  (2005), 
reachable  sets  computed  using  H-J  methods  are  used  to  inform  decisions  on  the  re-initiation  of  a 
landing  maneuver  during  TO/GA,  and  the  results  were  demonstrated  on  a  fixed-wing  UAV  (T-33). 
Building  upon  these  previous  works,  this  chapter  also  discusses  an  approach  for  using  reachable 
sets  as  a  visual  tool  for  guiding  human  operator  decisions  in  the  scenario  that  a  maneuver  sequence 
is  improperly  initialized. 

2.2.2  Alternative  Reachability  Approaches 

Aside  from  H-J  reachability,  there  is  a  myriad  of  alternative  approaches  in  the  domain  of  reach¬ 
able  set  based  system  verification  for  hybrid  systems.  The  work  considering  timed  automata  and 
linear  hybrid  automata  includes  seminal  papers  by  Alur  and  Dill  (1994)  and  Henzinger  (1996). 
Results  have  been  generalized  to  linear  and  nonlinear  continuous  dynamics,  with  supporting  com¬ 
putational  tools  (Asarin  et  al.,  2000m  Botchkarev  and  Tripakis,  2000;  Kurzhanski  and  Varaiya, 
2000;  Bemporad  et  al.,  2000 b\  Aubin  et  al.,  2002;  Chutinan  and  Krogh,  2003;  Girard,  2005;  Han 
and  Krogh,  2006).  Methods  that  operate  on  system  abstractions  can  reduce  computational  com¬ 
plexity,  including  simulation  and  bisimulation  relations  (Alur  et  al.,  2000;  Haghverdi  et  al.,  2005; 
Girard  et  al.,  2008),  which  are  used  to  construct  discrete  abstractions  of  hybrid  system  dynam¬ 
ics.  In  comparison,  the  H-J  method  has  the  advantage  of  being  able  to  handle  non-convex  sets, 
nonlinear  continuous  dynamics,  and  differential  games,  while  providing  subgrid  accuracy  using 
implementations  of  the  level  set  methods  (Sethian,  1999;  Osher  and  Fedkiw,  2002).  Furthermore, 
it  is  versatile  with  respect  to  the  range  of  reachable  set  computations  that  can  be  performed.  This 
includes  computations  under  forward  propagation  or  backward  propagation  in  time,  with  either 
existentially  quantified  (i.e.  reach  for  some  input)  or  universally  quantified  inputs  (i.e.  reach  for  all 
inputs).  This  feature  becomes  important  when  one  would  like  to  perform  reachability  computation 
under  disturbances,  as  capture  set  computation  would  require  universally  quantified  disturbance  in¬ 
puts,  while  unsafe  set  computation  would  require  existentially  quantified  disturbance  inputs.  The 
benefits  of  the  H-J  method,  however,  comes  at  the  cost  of  higher  computational  complexity  with 
respect  to  some  of  the  alternative  reachability  methods. 
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In  reachability  work  relating  to  stochastic  systems,  Prandini  and  Hu  (2006)  discuss  the  use  of 
Markov  chains  to  determine  the  reachability  of  some  stochastic  system  in  some  lookahead  time 
(potentially  infinite).  Air  traffic  management  as  a  driving  example  for  distributed  control  and 
stochastic  analysis  of  safety-critical  real-time  systems  is  demonstrated  in  the  HYB RIDGE  report 
(Blom  and  Lygeros,  2005).  In  many  of  these  applications,  events  that  jeopardize  the  safety  of  the 
system  are  rare,  and  using  probabilistic  methods  such  as  Monte  Carlo  simulations  (Blom  et  al., 
2007),  it  is  possible  to  estimate  the  probability  of  these  events  through  stochastic  reachability  and 
obtain  some  measure  of  confidence  in  the  safety  of  a  system  design  (Blom  et  al.,  2009).  On  the 
other  hand,  for  systems  with  environment  disturbances  that  are  known  to  lie  within  certain  bounds, 
deterministic  reachability  can  be  used  to  provide  stronger  performance  guarantees  for  relevant 
disturbances  such  as  perturbations  in  velocity  or  heading. 

2.2.3  Flight  Maneuver  Design  Approaches 

State  feedback  is  a  common  approach  to  the  design  and  implementation  of  flight  maneuvers.  In 
general,  a  trajectory  is  generated  (or  designed)  and  the  vehicle  tracks  this  trajectory  based  on  an  on¬ 
board  guidance  and  navigation  system.  Depending  on  the  maneuver,  this  trajectory  may  be  globally 
fixed  (for  example,  a  glideslope  for  landing)  or  defined  from  a  location  decided  at  flight  time  (for 
example,  a  waypoint).  For  certain  maneuvers,  additional  scrutiny  is  given  due  to  their  proximity 
to  regions  of  stall  or  other  vehicles.  Details  for  optimal  Go- Around  and  Flare  maneuvers  are  given 
in  the  work  of  Buell  and  Feondes  (1973).  Interestingly,  transitions  between  these  maneuvers  can 
also  be  discussed  in  the  framework  of  reachability,  as  in  the  previously  mentioned  work  by  Oishi 
et  al.  (2002). 

Alternatively,  maneuver  sequence  synthesis  may  be  performed  at  runtime  using  path-planning 
algorithms.  In  the  work  by  Bottasso  et  al.  (2008),  the  authors  demonstrate  smooth  path  planning 
using  motion  primitives  to  pass  through  a  series  of  waypoints  constituting  a  track.  This  approach 
is  related  to  that  applied  by  Frazzoli  et  al.  (2005)  and  Koo  et  al.  (2001),  both  of  which  are  focused 
on  rotorcraft.  Although  these  algorithms  are  computationally  efficient,  providing  robust  perfor¬ 
mance  guarantees  are  often  complicated  by  the  presence  of  model  uncertainty  and  environment 
disturbances  at  runtime. 

To  address  safety  concerns,  safe  maneuvers  with  real-time  trajectory  generation  were  shown 
by  Waydo  et  al.  (2007)  for  the  case  of  formation  flight  with  an  autonomous  vehicle,  where  several 
control  modes  are  used  depending  on  loss  of  communication  with  a  manned  vehicle.  This  approach 
was  proved  safe  using  runtime  predictive  control,  requiring  a  solution  to  the  stationary  Riccati 
equation  over  (essentially)  infinite  time.  It  is  interesting  to  compare  this  approach  to  backward 
reachability,  as  it  can  be  essentially  thought  of  as  a  forward  reachability  calculation  to  validate  a 
specific  trajectory  (rather  than  validate  all  potential  trajectories  using  backward  reachability). 

As  an  alternative,  Fyapunov  functions  can  be  also  used  to  provide  robust  guarantees  on  the 
closed-loop  performance  of  the  system  under  a  given  controller  design.  Relevant  to  the  work  in  this 
chapter  is  a  Fyapunov-based  method  proposed  by  Burridge  et  al.  (1999),  in  the  context  of  motion 
planning  applications,  for  composing  sequences  of  local  feedback  controllers  to  achieve  a  desired 
final  configuration.  Under  this  method,  the  authors  construct  local  controllers  whose  domains  of 
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attraction  are  estimated  from  the  level  sets  of  Lyapunov  functions.  Sequential  composition  is  then 
performed  by  ensuring  that  the  goal  set  of  a  given  controller  is  contained  within  the  domain  of 
attraction  of  the  next  controller  in  the  sequence. 

For  applications  with  nonlinear  continuous  dynamics,  constructing  appropriate  Lyapunov  func¬ 
tions  satisfying  the  desired  stability  objectives  can  be  a  non-trivial  task.  Depending  on  the  choice 
of  Lyapunov  functions,  estimates  of  the  domain  of  attraction  can  be  also  quite  conservative,  es¬ 
pecially  when  system  dynamics  are  perturbed  by  disturbances.  By  using  H-J  methods  to  generate 
the  relevant  reachable  sets,  the  methodology  proposed  in  this  chapter  avoids  the  need  for  selecting 
Lyapunov  functions,  while  reducing  the  conservatism  in  estimating  the  domain  of  attraction.  Also, 
it  is  worth  noting  that  local  controllers  produced  through  Lyapunov  methods  can  be  evaluated  us¬ 
ing  Hamilton- Jacobi  reachability  for  satisfaction  of  target  attainability  and  safety  objectives.  Thus, 
the  presented  approach  is  not  meant  to  supplant  existing  methods  for  robust  nonlinear  controller 
design,  but  to  augment  them. 

2.3  Hybrid  Model  of  Sequential  Transition  Systems 

In  this  section,  we  will  introduce  the  necessary  modeling  formalism  for  the  controller  design  prob¬ 
lem.  In  particular,  the  focus  will  be  on  sequential  transition  systems,  which  are  switched  nonlinear 
systems  whose  discrete  mode  transitions  follows  a  pre-defined  sequence.  As  preliminaries,  we 
will  first  review  a  general  hybrid  system  model,  based  upon  the  formalism  described  in  Lygeros 
et  al.  (1999 b)  and  Tomlin  et  al.  (2000).  Sequential  transition  systems  will  be  then  discussed  as 
instantiations  of  this  general  model. 

2.3.1  General  Hybrid  Automaton 

Over  the  several  decades  of  research  on  hybrid  systems,  numerous  modeling  frameworks  have 
been  introduced  in  literature.  For  our  purposes,  the  formalisms  proposed  in  Lygeros  et  al.  (1999 b) 
and  Tomlin  et  al.  (2000)  provides  a  sufficiently  rich  class  of  models  to  describe  the  behavior  of 
sequential  transition  systems.  The  description  given  below  is  correspondingly  adapted  from  these 
previous  works. 

Definition  2.1  (Hybrid  Automaton).  A  hybrid  automaton  is  a  tuple 

MJ  —  (Q.X  ,'L.V.  I  nit ,  f,  Dom.  Reset ) , 


defined  as  follows. 

•  Discrete  state  space  Q  {<71 ,  <72, ... ,qm }>  m  £  N. 

•  Continuous  state  space  X  :=  M", ,  n  e  N. 

•  Discrete  input  space  £  :=  £1  x  £2,  where  £1  is  the  set  of  discrete  control  inputs  and  £2  is  the 
set  of  discrete  disturbance  inputs. 
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•  Continuous  input  space  In  :=  U  x  D,  where  U  is  the  set  of  continuous  control  inputs  and  D 
is  the  set  of  continuous  disturbance  inputs. 

•  Admissible  initial  conditions  Init  C  Q  xX. 

•  Vector  field  f  :  Q  x  X  x  In  — >•  X,  describing  the  continuous  state  evolution. 

•  Domain  Dom  C  Qx  X  x  E  x  In,  describing  the  domain  on  which  continuous  state  evolution 
is  permitted. 

•  Reset  relation  Reset :  QxX  xXxIn  — >•  2@xX .  describing  the  subset  of  the  hybrid  state  space 
that  the  system  state  is  permitted  to  transition  to  in  the  event  of  a  discrete  jump. 

In  order  to  ensure  the  existence  and  uniqueness  of  continuous  trajectory  under  the  vector  field 
/,  we  will  need  /  to  satisfy  certain  regularity  assumptions. 

Assumption  2.1.  The  vector  field  /  is  continuous  and  bounded,  and  that  for  fixed  qeQ,  (u,d)  G  In, 
the  function  x  — >  f(q,x,  u,  d)  is  Lipschitz  continuous. 

Roughly  speaking,  an  execution  of  the  hybrid  automaton  proceeds  as  follows.  From  an  initial 
condition  (qo,xo)  G  Init,  the  continuous  trajectory  *(•)  evolves  according  to  the  ordinary  differ¬ 
ential  equation  x  —  f(qo,x,u,d),x(0)  —  xq,  while  the  discrete  state  remains  constant,  as  long  as 
(qo,x(t),0i(t),02{t),u(t),d(t))  G  Dom.  At  the  first  time  instant  t\  when  this  condition  no  longer 
holds,  the  system  state  takes  a  discrete  jump  to 

{q',x')e  Reset (qQ , x(t\ ) ,  o \  {t\ ) ,  a2 (ti ) ,  u (ti ) ,  d {h ) ) , 

and  the  cycle  repeats.  To  define  this  more  formally,  we  will  need  the  notion  of  a  hybrid  time 
trajectory. 

Definition  2.2  (Hybrid  Time  Trajectory  (Lygeros  et  al.,  19991?;  Tomlin  et  al.,  2000)).  A  hybrid 
time  trajectory  z  —  {Lj}^=l  is  a  finite  ( N  <  °°)  or  infinite  ( N  —  °°)  sequence  of  intervals  of  the  real 
line  satisfying: 

•  Lk=  [zk,  z k\,k<  N,  and  if  N  <  «>,  either  LN  =  [zN,  z'N\  or  LN  =  [zN,  z'N) ; 

•  Vk=l,...,N,zk<z’k  =  zk+l. 

As  in  Lygeros  et  al.  (1999 b)  and  Tomlin  et  al.  (2000),  for  a  given  t  e  M  and  a  hybrid  time 
trajectory  z,  we  will  use  /  G  T  to  refer  to  t  G  Lk  for  some  k  —  1 , ... ,2V.  Informally,  a  hybrid  time 
trajectory  is  a  sequence  of  time  intervals  on  which  continuous  evolution  takes  place.  However, 
there  may  be  cases  in  which  another  discrete  jump  takes  place  immediately  after  a  discrete  jump. 
In  such  cases,  there  could  be  time  intervals  Lk  with  measure  zero.  We  can  now  give  a  definition  for 
the  execution  of  a  hybrid  system. 
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Definition  2.3  (Execution  of  Hybrid  Automaton  (Lygeros  et  al.,  1 999b:  Tomlin  et  al.,  2000)).  An 
execution  of  a  hybrid  automaton  is  a  collection  %  =  {x,q,x,  o,u,d)  with  (q,x) :  T  — >  Q  x  X  and 
(o,u,d)  :  T  — y  £  x  In  satisfying: 

•  (<?(to),x(to))  E  Inif, 

•  Vi  =  1,...,1V,  (^( Tjt+1 ) , Tjt+1 ) )  EReset(q(TjJ,x(T'k),a(z'k),u(z'k),d(T'k)); 

•  On  every  interval  =  [t*,  t£]  such  that  T/(  <  zk,  q(-)  and  cj(-)  are  constant,  v(-)  is  piecewise 
continuous,  jc(-)  is  the  solution  to  x  =  f{q,x,u,d),  and  {q{t),x{t),G(t),u(t),d{t))  E  Dom, 
Vt  E  Lk. 

An  execution  %  =  (z.q.x.  <7.  u.  d)  is  said  to  be  finite  if  z  is  a  finite  sequence  of  time  intervals 
with  the  last  interval  being  closed.  An  execution  %  —  (z,q,x,  a,  u,d)  is  said  to  be  infinite  if  either 
T  is  an  infinite  sequence  of  time  intervals,  or  if  Ylk=i(Tk  ~  TO  =  °°- 

From  Definition  2.3,  it  can  be  seen  that  the  reset  relation  Reset  specifies  conditions  under  which 
discrete  jumps  are  enabled.  Namely,  for  a  given  state  (q,x)  E  Q  x  X  and  an  input  (o,u,d)  E^xln, 
if  the  set  Reset(qiX:o,u,d)  is  nonempty,  then  a  discrete  jump  is  permitted  and  the  jump  can  be 
taken  to  any  state  in  Reset(q,x:c ,u,d).  However,  one  may  also  choose  not  to  take  the  jump,  as 
long  as  (q,x,  (J.u. d)  E  Dom  is  satisfied,  namely  the  state  and  input  lies  in  the  domain  on  which 
continuous  evolution  is  allowed.  Thus,  it  can  be  seen  that  there  is  a  degree  of  nondeterminism  in  the 
executions  of  a  hybrid  automaton.  In  particular,  for  a  given  hybrid  automaton  ,  initial  condition 
(goAo)  £  I nit,  and  inputs  cr (-),  «(•),  <:/(•),  there  may  not  exist,  in  general,  an  infinite  execution,  and 
if  one  exists,  it  may  not  be  unique.  A  complete  discussion  of  the  issue  of  existence  and  uniqueness 
is  somewhat  involved  for  a  general  hybrid  automaton.  The  interested  reader  is  referred  to  the  work 
of  Lygeros  et  al.  (1999a).  in  which  some  sufficient  conditions  for  the  existence  and  uniqueness  of 
infinite  executions  are  given. 

2.3.2  Automated  Sequential  Transition  Systems 

We  first  develop  a  model  for  sequential  transition  systems  in  which  there  are  no  external  discrete 
inputs,  namely  =  0-  Through  an  interpretation  of  E?  as  the  set  of  commands  issued  by  an 
external  human  operator,  this  corresponds  to  a  system  operating  under  automated  selections  of 
control  inputs  <J\  E  T.\  and  u  E  U.  In  other  words,  once  the  system  has  been  initialized  within 
the  initial  set  Init,  the  system  would  proceed  through  the  sequence  of  operating  modes  without 
further  intervention  from  the  human  operator.  However,  in  the  execution  of  this  mode  sequence, 
the  continuous  trajectory  may  still  be  perturbed  by  environment  disturbances,  which  are  modeled 
as  continuous  disturbance  inputs.  We  will  refer  to  this  class  of  systems  as  automated  sequential 
transition  systems. 

Definition  2.4  (Automated  Sequential  Transition  System).  An  automated  sequential  transition  sys¬ 
tem  is  a  hybrid  automaton  with 

•  Switching  control  space :  £  =  £i  =  (<7i,  Ob,  •••• 


17 


•  Initial  conditions :  Init  =  q i  x  Xq,  with  Xq  C  X; 

•  Mode  domains :  Dom  =  (J"! ,  q:  xXxL'x  In,  where  £'  =  £\  { c,  i }  for  i  =  1 ,  ...,m  —  1  and 
£m  =  £; 

•  Automated  switches :  For  every  i  —  1  ,...,m—  l,x  EX,  and  (u.d)  e  In,  Reset(qi,x,o,u,d)  = 
(qi+  i,jc)  if  O'  =  (7i+ 1,  andFeset(g,-,.x;,  o,u,d )  —  0  otherwise;  fori  =  m,  Reset(qm,x,  <J,u,d)  = 
0,  Vr  6  X,  (J  6  £,  (w,J)  G  //7. 

The  above  definition  describes  a  hybrid  system  in  which  the  set  of  discrete  inputs  are  the 
switching  controls  for  transitions  between  the  successive  discrete  modes.  It  can  be  verified  using 
the  conditions  given  in  Lygeros  et  al.  (1999a)  that  for  any  initial  condition  (<7o,.vo)  £  Init  and  any 
measurable  realizations  of  the  input  signals  fj(-),  w  ( • ) ,  d(-),  there  exists  a  unique  infinite  execution 
for  this  system.  This  execution  proceeds  as  follows.  The  system  state  is  initialized  in  the  first 
discrete  mode  q\  within  a  set  Xq.  The  continuous  trajectory  evolves  according  to  the  continuous 
dynamics  in  q\  until  a  switching  input  <72  to  transition  to  <72  is  applied.  A  discrete  jump  is  then 
taken  to  <72  and  the  continuous  trajectory  evolves  according  to  the  continuous  dynamics  in  <72.  This 
proceeds  until  the  discrete  trajectory  reaches  mode  qm,  upon  which  time  the  trajectory  evolves 
according  to  the  dynamics  in  qm  without  any  further  discrete  transitions. 

Due  to  the  presence  of  continuous  disturbances,  controller  design  for  this  class  of  systems  will 
be  carried  out  in  terms  of  feedback  control  policies.  In  particular,  we  will  consider  switching 
control  policies  of  the  form  F  :  <2  x  X  — >■  £ 1  such  that  o(t)  —  F(q(t),x(t)),  Vt  >  0,  and  continuous 
control  policies  of  the  form  K  :  QxX  — >■  U  such  that  u(t)  =  K(q(t),x(t)),  \/t  >  0.  Taken  together, 
a  control  policy  ( F,K )  is  said  to  be  admissible  if  for  any  measurable  realizations  of  the  disturbance 
d,  there  exists  a  unique  infinite  execution  for  the  closed-loop  system. 

2.3.3  Semi-Automated  Sequential  Transition  Systems 

Instead  of  a  fully  automated  system,  there  may  be  cases  in  which  one  would  want  to  allow  for 
a  degree  of  human  intervention  in  order  to  guard  against  contingencies  during  system  operation. 
Here  we  will  consider  interventions  in  the  form  of  commands  determining  if  and  when  the  system 
should  proceed  to  the  next  task  in  the  mode  sequence.  This  is  a  simple  example  of  a  mixed- 
initiative  system,  in  which  the  system  is  subject  to  both  human  and  automated  decisions.  Such  a 
type  of  system  is  a  subject  of  ongoing  research  on  human-UAV  interactions  (Lam  et  al.  (2008); 
Cummings  and  Mitchell  (2008)). 

Our  approach  to  modeling  this  interaction  is  to  interpret  the  discrete  control  space  £2  as  the  set 
of  human  operator  commands  which  provides  confirmation  that  the  system  can  proceed  to  the  next 
phase  of  operation.  Intermediate  operating  modes  are  then  introduced  into  the  mode  sequence, 
with  outgoing  transitions  that  are  specifically  controlled  by  these  commands.  We  refer  to  this  as  a 
semi-automated  sequential  transition  system. 

Definition  2.5  (Semi-Automated  Sequential  Transition  System).  A  semi-automated  sequential 
transition  system  is  a  hybrid  automaton  M’  with 
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•  Discrete  state  space:  Q  =  {qi,qi,qpqi,  ■■■,qm,qm}', 

•  Switching  control  space:  £  =  £1  x  £2,  where  £1  =  {<7i,  02, (Jm }  and  £2  =  { &\ .  62, dm}; 

•  Initial  conditions:  Init  —  q\  x  Xo,  with  Xo  C  X; 

•  Mode  domains:  Dom  =  U”ii  (<7/  x  X  x  £'  x  /«)  U  (c/,  x  X  x  £'  x  //?) , 

where  £'  =  (£1  \  {c7,+i})  x  £2  for  i  =  1,  and  £'  =  £1  x  (£2  \  { 1 } )  for  /  =  1, ....  m  —  1 

and  £w  =  £; 

•  Automated  Switches:  For  every  i  —  1  xGX,  and  (w,J)  G  In ,  Reset(qi,x,<J,u,d)  = 

(qi,x)  if  a  =  (crI+i ,  <7)  for  some  <7  G  £2,  and  Reset(qi,x,  o,  w,  J)  =0  otherwise; 

•  Externally  Controlled  Switches:  For  every  i  =  1,  ...,m  —  1,  x  G  X,  and  (w.c/j  G  In, 

Reset (qi,x,  a,u,d)  —  (qi+\,x)  if  o  =  (a,  <7,-+i)  for  some  a  G  £1,  and  Reset (cR, x,  <J,u,d)  —  0 
otherwise;  for  i  =  m.  Reset (qm,x,  <J,u,d)  =  0,  VX  G  X,  a  e  £,  (m,  J)  G  /n. 

In  the  above  definition,  the  discrete  states  {qi}"Li  can  t>e  interpreted  as  the  modes  in  which 
the  task  specifications  of  the  sequential  transition  system  are  carried  out.  We  refer  to  them  as 
transition  states.  On  the  other  hand,  the  discrete  states  {<7,}" l=l  can  be  interpreted  as  the  modes  in 
which  the  system  awaits  confirmation  by  the  operator  to  proceed  to  the  next  task.  We  refer  to  them 
as  stationary  states.  From  the  point  of  view  of  the  operator,  the  operation  of  a  semi-autonomous 
system  involves  a  sequence  of  phases  in  which  the  automation  carries  out  a  task  in  a  transition  state 
and  then  pauses  for  further  instruction  in  a  stationary  state. 

Similarly  as  in  the  automated  case,  one  can  verify  using  the  conditions  given  in  Lygeros  et  al. 
(1999a)  that  for  any  initial  condition  (y/o-A'o)  G  Init  and  any  measurable  realizations  of  the  input 
signals  cj(-),  m(-),  d(-),  there  exists  a  unique  infinite  execution  for  the  semi-automated  system. 
A  more  formal  description  of  the  system  execution  can  be  given  as  follows.  The  system  state  is 
initialized  in  the  first  discrete  mode  q\  within  a  setXo.  The  continuous  trajectory  evolves  according 
to  the  continuous  dynamics  in  q\  until  a  switching  input  G?  is  applied.  At  this  time,  a  discrete  jump 
is  taken  to  <71  to  wait  for  an  external  command.  When  the  command  oy  is  received  to  proceed  to  <72, 
the  system  trajectory  takes  a  discrete  jump  to  qi,  and  the  continuous  trajectory  evolves  in  <72  until 
a  command  03  is  received  to  transition  to  <72.  This  proceeds  until  the  discrete  trajectory  reaches 
qm,  upon  which  time  the  trajectory  evolves  according  to  the  dynamics  in  qm  without  any  further 
discrete  transitions. 

Controller  design  for  this  class  of  systems  also  consists  of  a  choice  of  switching  policy  F  : 
Q  xX  — >■  £1,  as  well  as  a  choice  of  continuous  control  policy  K  :  QxX  — >•  U.  However,  it  should 
be  noted  that  the  choice  of  switching  policy  in  the  stationary  states  is  largely  irrelevant,  as  the 
outgoing  transitions  are  controlled  by  external  commands.  A  control  policy  (F.  K)  is  said  to  be 
admissible  if  for  any  measurable  realizations  of  the  disturbance  d ,  there  exists  a  unique  infinite 
execution  for  the  closed-loop  system. 
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2.4  Sequential  Reachability  Problems 

2.4.1  Specification  with  Reach-avoid  Objectives 

First,  consider  the  case  of  an  automated  sequential  transition  system  and  a  problem  specification 
in  which  the  objective  in  each  mode  <7,  is  to  drive  the  continuous  state  x  into  a  desired  target  set 
Rj  C  X  within  finite  time,  while  avoiding  a  set  A;  Cl.  Here  the  sets  R,  could  for  example  represent 
a  sequence  of  waypoints,  while  the  sets  A,-  could  for  example  represent  unsafe  operating  conditions 
or  obstacles  in  the  environment. 

Problem  2.1  (Sequential  Reachability  Problem  for  Automated  Transition  System).  Given  an  au¬ 
tomated  sequential  transition  system  ,  target  sets  Rj  Cl,  i—  1 , ...,m,  and  avoid  sets  A,  C  X, 
i  —  1 , ....in,  choose  an  admissible  control  policy  (F.K)  such  that  for  any  measurable  realization  of 
the  disturbance  d  satisfying  d (/)  G  I).  V/  >  0,  the  unique  infinite  execution  of  M’  satisfies 

1.  (q(ti),x(tj))  G  qi  x  Rj  for  some  sequence  of  times  to  =  0  <ti  <  ■  ■  ■  <tm  <  °°; 

2.  On  any  time  interval  [zk,  z'k\  G  r  such  that  q(-)  =  qj,  x(t)  £  A/,  V?  G  [zk- 

2.4.2  Specification  with  Reach-avoid  and  Invariance  Objectives 

Now  we  consider  the  case  of  a  semi-automated  sequential  transition  system.  It  is  assumed  that  the 
objectives  of  the  sequence  of  transition  modes  are  still  reach-avoid  objectives.  However,  within 
the  stationary  modes,  the  objectives  are  of  the  invariance  type,  namely  stay  within  a  neighborhood 
of  a  target  set,  until  a  command  is  given  to  perform  the  next  task  in  the  sequence. 

Problem  2.2  (Sequential  Reachability  Problem  for  Semi-Automated  Transition  System).  Given  a 
semi-automated  sequential  transition  system  J4f,  target  sets  Rj  C  X,i=  1,  ...,m,  avoid  sets  A,  C  X, 
i=l, ...,  m,  and  target  neighborhoods  IT,  C  X  satisfying  Rj  C  W/  C  Af,  choose  an  admissible  control 
policy  (F,  K)  such  that  for  any  measurable  realization  of  the  disturbance  d  satisfying  d(t)  G  D, 
Vf  >  0,  and  any  measurable  realization  of  the  external  switching  command  CJ2  satisfying  02(f)  G  E2, 
Vf  >  0,  the  unique  infinite  execution  of  M’  satisfies 

1.  If  q(t)  =  q,  for  some  t  G  T,  then  there  exists  tj  <  °°  such  that  (q(tj):x(tj))  G  qj  x  R,-;  further¬ 
more,  on  any  time  interval  [xk,  x)\  G  T  such  that  q(-)  =  r/,,  v(/)  ^  A,,  V/  G  [t^,  t[]  ; 

2.  On  any  time  interval  [xk,  G  r  such  that  q(-)  =  q,,  x(t  )  G  W/,  V/  G  [T/c.  t^]. 

In  other  words,  if  the  discrete  trajectory  reaches  a  transition  state  qq  then  the  target  set  Rj  is  to 
be  attained  within  finite  time  while  staying  away  from  the  avoid  set  A,.  Moreover,  whenever  the 
discrete  trajectory  reaches  a  stationary  state  qj,  the  continuous  trajectory  remains  within  the  target 
neighborhood  Wj  until  the  discrete  trajectory  jumps  away  from  qL.  However,  there  may  be  cases 
in  which  a  command  to  switch  to  qj+ 1  is  never  received,  and  the  state  trajectory  remains  within 
qi  x  Wj  indefinitely. 
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2.4.3  Formulation  in  Terms  of  Continuous  Reachability  Problems 


As  discussed  in  section  2.1,  our  approach  to  problems  2.1  and  2.2  is  to  choose  an  appropriate 
switching  policy  so  as  to  reduce  these  problems  to  a  sequence  of  continuous  control  design  prob¬ 
lems.  In  particular,  given  the  sequential  nature  of  these  problems,  as  well  as  the  fact  that  the 
objective  of  the  system  in  each  transition  mode  is  to  reach  a  target  set  in  finite  time,  there  is  no  rea¬ 
son  for  the  system  to  dwell  in  a  given  transition  mode  once  the  reach-avoid  objective  is  attained. 
Thus,  a  reasonable  choice  of  switching  policy  is  to  transition  to  the  next  mode  in  the  sequence 
once  a  target  set  in  a  transition  mode  is  reached.  Specifically,  we  consider  a  switching  policy  F 
satisfying 


F(qi}x) 


&i- 1-1  j  %  £ 

Oj,  otherwise. 


(2.1) 


for  every  transition  mode  qi,  i  —  1,  ....in  —  1.  It  turns  out  that  this  choice  of  switching  policy  is 
sufficient  for  Problem  2.1.  However,  a  slight  modification  of  the  switching  region  will  be  needed 
to  ensure  the  invariance  objectives  for  Problem  2.2. 

Now  consider  the  problem  of  designing  the  continuous  control  policy  K.  From  the  problem 
descriptions,  one  can  deduce  certain  requirements  for  the  continuous  control  design.  In  particular, 
for  the  case  of  the  automated  sequential  transition  system,  the  continuous  trajectories  resulting 
from  a  control  law  in  transition  mode  qi  should  satisfy  a  reach-avoid  objective,  namely  x{t)  e  Rt 
for  some  t  <  °°,  and  x(t')  ^  A,-  for  every  t'  <  t,  over  a  subset  of  initial  conditions  in  M”  which 
ensures  proper  composition  with  the  continuous  trajectories  of  the  previous  mode  qi-\.  Given  the 
choice  of  switching  policy  F,  this  set  is  simply  given  by  the  previous  target  set  /?,•_ i.  By  solving 
this  sequence  of  reach-avoid  problems,  we  obtain  a  control  policy  K  satisfying  the  specifications 
of  problem  2.1. 

In  the  case  of  the  semi-automated  sequential  transition  system,  the  specification  also  requires 
that  the  continuous  trajectories  in  each  stationary  mode  qi  satisfy  an  invariance  objective,  namely 
remain  within  a  target  neighborhood  Wl  for  all  time.  The  control  law  design  then  needs  to  account 
for  the  composition  between  transition  and  stationary  modes.  In  particular,  the  control  design 
for  a  transition  mode  qj  should  ensure  that  the  reach-avoid  objective  is  achieved  for  every  initial 
condition  in  the  previous  target  neighborhood  W)_i,  while  the  control  design  for  a  stationary  mode 
qi  should  ensure  that  the  invariance  objective  is  achieved  for  a  subset  of  the  target  set  Rt  (which 
replaces  the  switching  region  in  (2.1)).  By  solving  this  sequence  of  reach-avoid  and  invariance 
problems,  we  obtain  a  control  policy  K  satisfying  the  specifications  of  problem  2.2. 


2.5  Overview  of  Hamilton- Jacobi  Reachability 

By  treating  a  sequential  reachability  problem  as  a  sequence  of  continuous  reachability  problems, 
our  controller  design  procedures  for  problems  2.1  and  2.2  will  involve  the  use  of  continuous  time 
reachability  analysis  as  a  design  tool  for  the  continuous  control  laws.  In  this  section,  we  will  review 
some  basic  forms  of  reachable  sets,  as  well  as  a  method  for  computing  an  approximation  of  these 
sets  for  nonlinear  continuous  time  systems,  based  upon  the  numerical  solution  of  an  appropriate 
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Hamilton- Jacobi  PDE  (Mitchell  et  al.,  2005).  For  the  rest  of  this  section,  we  will  assume  the 
following  system  dynamics. 

x  =  f(x,u,d),  x(0)  =*o,  (2.2) 

where  x  e  M”  is  the  continuous  state,  u  e  U  is  the  control  input,  and  d  <E  D  is  the  disturbance 
input.  Here  we  assume  that  the  sets  U  and  D  are  compact.  In  order  to  the  apply  the  computational 
procedure  described  in  Mitchell  et  al.  (2005)  to  our  problem,  the  regularity  assumptions  on  the 
vector  field  /  needs  to  be  slightly  strengthened  as  compared  with  Assumption  2.1. 

Assumption  2.2.  The  vector  field  /  is  uniformly  continuous  and  bounded,  and  that  for  fixed  rfeD, 
the  function  (x,u)  —>  f(x,u,d)  is  Lipschitz  continuous. 

2.5.1  Capture  Set 

For  a  given  target  set  R  cl",  and  time  horizon  T  >  0,  the  capture  set  of  (2.2)  is  the  set  of  initial 
conditions  xq  for  which  there  exists  a  choice  of  control  strategy  such  that,  regardless  of  the  choice 
of  disturbance  strategy,  there  exists  a  time  instant  t  e  [0,  T]  such  that  x(t)  e  R.  If  one  were  to  view 
this  as  a  zero-sum  differential  game  (see  for  example  Isaacs,  1967;  Evans  and  Souganidis,  1984; 
Ba§ar  and  Olsder,  1999)  in  which  the  objective  of  the  control  is  to  reach  the  set  R  within  [0,  T],  then 
this  is  the  set  of  winning  initial  conditions  for  the  control.  A  formal  definition  for  this  set  requires 
some  amount  of  notation  and  concepts  from  differential  games  (Mitchell  et  al.,  2005).  For  our 
purposes,  however,  it  is  sufficient  to  consider  a  definition  for  the  case  in  which  the  control  strategy 
is  fixed.  As  a  notational  convenience,  we  define  the  set  of  admissible  disturbance  realizations  over 
a  time  interval  [0,  T]  as 


—  {d  '■  [0,  T]  — >•  D\d(-)  is  measurable}  . 

Definition  2.6  (Capture  Set).  Given  a  target  set  R ,  a  time  horizon  T,  and  a  Fipschitz  continuous 
feedback  law  K  :  Mn  — >  U,  the  capture  set  &(R,K,T)  of  (2.2)  is  given  by 

&(R,K,T)  =  {xo  eX  :Vd(-)  e  &r,3t  e[0,T],  x(t)eR}: 
where  *(•)  is  the  solution  of  x(t)  —  f(x(t),K(x(t))1d(t))1  x(0)  —xo  on  the  interval  [0,  T], 

By  fixing  the  continuous  feedback  law  K,  the  problem  of  computing  a  capture  set  fM{R.  K.  T ) 
becomes  an  optimal  control  problem,  namely  one  in  which  the  objective  of  the  disturbance  is 
to  ensure  that  x(t)  ^  R.  V/  e  [0,  T],  It  then  comes  as  little  surprise  that,  under  certain  technical 
conditions,  this  set  can  be  computed  from  the  solution  of  an  appropriate  Hamilton-Jacobi-Bellman 
(HJB)  equation  (Bardi  and  Capuzzo-Dolcetta,  1997). 

More  specifically,  we  assume  that  the  target  set  R  is  closed  can  be  represented  as  the  zero 
sublevel  set  of  a  bounded  Fipschitz  continuous  function  0/s> :  Mw  — >  M  such  that 

R  =  {x  e  M”  :  (j>R(x)  <  0}  . 
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The  function  Or  is  sometimes  referred  to  as  the  level  set  representation  of  R  (Sethian,  1999;  Osher 
and  Fedkiw,  2002).  Now  consider  the  HJB  equation 


d(j) 

— — bmin 
dt 


with  the  optimal  Hamiltonian 


=  0,  0)  =  (j>R(x) 


(2.3) 


H  (x,p)  =maxpT  f(x,K(x),d).  (2.4) 

deD 

Let  (j)  :  W1  x  [— T,0]  — »  M  be  the  unique  viscosity  solution  (Crandall  and  Lions,  1983)  to  (2.3)  and 
(2.4).  Then  by  a  special  case  of  the  argument  presented  in  Mitchell  et  al.  (2005), 

@(R,K,T)  =  {xeR"  :  <j>(x,-T)<  0}  . 

On  a  computational  note,  numerical  solutions  of  H-J  equations  can  be  calculated  on  a  grid  of 
the  continuous  state  space  M"  using  the  MATLAB  Toolbox  for  Level  Set  Methods  developed  by 
Mitchell  (2007a).  It  is  based  upon  an  implementation  of  the  level  set  theory  and  computational 
methodologies  described  extensively  in  the  texts  by  Osher  and  Fedkiw  (2002)  and  Sethian  (1999). 
The  numerical  solutions  provide  convergent  approximations  of  the  true  solutions  of  (2.3)  as  the  grid 
size  is  refined.  However,  in  order  to  obtain  accurate  approximations,  the  computational  complexity 
scales  exponentially  in  the  number  of  continuous  dimensions.  This  currently  limits  the  application 
of  this  method  to  continuous  models  with  n  <5. 

2.5.2  Unsafe  Set 

For  a  given  avoid  set  A  C  M'\  and  time  horizon  T  >  0,  the  unsafe  set  of  (2.2)  is  the  set  of  initial 
conditions  jto  for  which  regardless  of  the  choice  of  control  strategy,  there  exists  a  choice  of  distur¬ 
bance  strategy  and  a  time  instant  t  G  [0,  T]  such  that  x(t)  G  A.  If  one  were  to  view  this  as  a  zero-sum 
differential  game  in  which  the  objective  of  the  control  is  to  avoid  the  set  A  over  [0,  T],  then  this  is 
the  set  of  winning  initial  conditions  for  the  disturbance.  As  before,  we  will  consider  a  definition 
for  this  set  in  the  case  that  the  control  strategy  is  fixed. 

Definition  2.7  (Unsafe  Set).  Given  an  avoid  set  A,  a  time  horizon  T,  and  a  Lipschitz  continuous 
feedback  law  K  :  Mn  — >■  U,  the  unsafe  set  srf (A,K,T)  of  (2.2)  is  given  by 

£?(A,K,T)  ={.*o  61  :  3 d(-)  G  @T,3t  G  [0,T],  x(t)  GA}, 

where  x(-)  is  the  solution  ofi(t)  —  f(x(t),K(x(t)):d(t))1  x(0)  —xo  on  the  interval  [0,T], 

From  this  definition,  it  can  be  observed  that  the  only  difference  between  a  capture  set  and  an 
unsafe  set  lies  in  the  objective  of  the  control.  Namely,  in  the  former  case,  the  control  tries  to  reach 
some  terminal  set  R,  while  in  the  latter  case,  it  tries  to  avoid  some  terminal  set  A.  Correspondingly, 
computation  of  the  unsafe  set  proceeds  by  minor  modification  of  the  method  given  for  the  capture 
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set.  In  particular,  we  assume  as  before  that  there  exists  a  bounded  Lipschitz  continuous  function 
( pA  :  M"  — »  M  such  that 

A  =  {x  e  M”  :  (f>A  (x)  <  0} . 

Consider  the  HJB  equation  as  given  in  (2.3)  with  the  optimal  Hamiltonian 

H(x,p)  =  minpT  f(x,K(x),d).  (2.5) 

den 

Let  (j)  :  W'  x  [— T,  0]  — >  M  be  the  unique  viscosity  solution  to  (2.3)  and  (2.5).  Then  by  another 
application  of  the  argument  presented  in  Mitchell  et  al.  (2005), 

sf(A,K,T)  =  (xeR"  :  <j>(xy —T)  <  0} . 

For  the  discussions  on  controller  design,  it  is  important  to  note  that  the  complement  of  the 
unsafe  set,  denoted  as  srfc (A,K,T)  :=  R"  \g/(A,K,  T),  is  the  set  of  initial  conditions  .ro  for  which 
the  trajectory  of  (2.2)  under  control  law  K  avoids  the  set  A  over  [0,  T],  regardless  of  any  admissible 
disturbance  realization  d(-)  e  &r. 

2.5.3  Invariant  Set 

For  a  given  set  VF  C  M'\  an  invariant  subset  of  W  under  (2.2)  is  a  set  of  initial  conditions  jcq  for 
which  there  exists  a  choice  of  control  strategy  such  that,  regardless  of  the  choice  of  disturbance 
strategy,  the  trajectory  of  (2.2)  satisfies  x(t)  e  W.  it  >  0.  The  union  of  all  such  sets  is  called  a 
maximal  invariant  set.  A  definition  of  this  set  is  given  below  for  the  case  in  which  the  control 
strategy  is  fixed. 

Definition  2.8  (Maximal  Invariant  Set).  Given  a  set  W  C  M"  and  a  Lipschitz  continuous  feedback 
law  K  :  K”  — *  U,  then  the  maximal  invariant  set  Inv{W,K )  of  (2.2)  is  given  by 

Inv{W,K)  ={jc0  GX  :  W(-)  e^r,Vt>0,  x{t)  eW}, 

where  jc(-)  is  the  solution  ofi(t)  =  f(x(t),K(x(t))1d(t))1  jc(0)  —  xo  on  the  interval  [0,T], 

As  discussed  in  Tomlin  et  al.  (2000),  this  set  can  be  computed  as  an  extension  of  the  finite  hori¬ 
zon  unsafe  set  computation  to  the  infinite  horizon  case.  In  particular,  the  problem  can  be  viewed 
as  a  zero-sum  differential  game  in  which  the  objective  of  the  control  is  to  avoid  the  complement  of 
W  at  all  times. 

Over  any  finite  time  horizon  [0,  T],  the  set  of  winning  initial  conditions  for  the  control  in  this 
differential  game  can  be  computed  as  .f/c(Mn  \W,K,T),  using  the  procedures  described  in  Section 
2.5.2.  Let  (f>f  :  W1  — >  M  be  a  level  set  representation  of  this  set.  Then  if  (f>r  converges  to  some 
function  (j)*  as  T  — >  namely  if  the  dynamic  programming  procedure  as  described  by  the  HJB 
equation  (2.3)  and  (2.5)  converges  to  a  fixed  point,  then  (j>*  provides  a  level  set  representation  of 
the  maximal  invariant  set 

Inv(W, K)  -  {x  e  Rn  :  <  0}  . 

Furthermore,  it  can  be  verified  that  this  set  is  invariant  with  respect  to  itself,  namely  for  every  initial 
condition  xq  e  Inv(W,K),  the  trajectory  of  (2.2)  under  control  law  K  satisfies  x(t)  e  Inv(W,K), 
'it  >  0,  regardless  of  the  disturbance  realization. 
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2.6  Controller  Design  Procedures 

As  discussed  in  section  2.4.3,  through  an  appropriate  choice  of  switching  policy  as  per  equation 
(2.1),  the  task  of  controller  design  for  problems  2.1  and  2.2  can  be  formulated  in  terms  of  a  collec¬ 
tion  of  continuous  reachability  problems.  The  continuous  control  design,  however,  needs  to  ensure 
both  the  reachability  specification  for  each  discrete  state,  as  well  as  compatibility  conditions  be¬ 
tween  successive  discrete  states.  In  this  section,  we  describe  procedures  for  performing  this  control 
design,  using  reachability  analysis  as  a  design  tool. 

For  notational  convenience,  the  subscript  q  will  be  used  to  denote  capture  sets,  unsafe  sets, 
and  invariant  sets  computed  for  a  particular  mode  in  the  mode  sequence.  In  particular,  for  a  given 
target  set  R  cW\  time  horizon  T  >  0,  and  feedback  law  K  :  M”  — >•  U,  the  capture  set  under  the 
continuous  dynamics  in  mode  qj  e  Q  is  denoted  as  &qi(R,K,T). 

2.6.1  Automated  Sequential  Transition  Systems  with  Reach-avoid 
Specifications 

We  first  present  a  design  procedure  for  Problem  2.1.  In  particular,  during  each  phase  of  the  design 
procedure,  we  design  a  control  law  for  mode  q,  to  ensure  that  the  target  set  R,  can  be  attained. 
Reachability  calculations  are  then  performed  to  check  whether  a  compatibility  condition  is  met, 
namely  whether  the  set  of  initial  conditions  satisfying  the  reach-avoid  objectives  under  this  control 
law  contains  the  target  set  of  the  previous  discrete  state  qi-\.  The  control  law  is  then  adjusted  as 
necessary  to  satisfy  this  condition.  This  is  described  more  precisely  below. 

Let  Aff  be  an  automated  sequential  transition  system  ,  such  that  the  vector  field  /  of 
satisfies  Assumption  2.2  for  each  discrete  state  q  e  Q.  Then  given  target  sets  Rj  C  X,  i  =  1,  ...,m, 
and  avoid  sets  A;  C  X,  i  =  l,...,m,  a  control  policy  (F,K)  can  be  designed  using  the  following 
procedure,  starting  with  mode  qm. 

1.  Design  a  continuous  control  law  K(qj ,  •)  which  regulates  initial  conditions  in  Rj  i  to  the 
target  set  Rj,  under  the  continuous  dynamics  x  =  f(qj1x1K(qilx),d). 

2.  Compute  the  capture  set  under  this  control  law  to  the  first  time  instant  r such  that  Rj  \  C 

&qj(Ri,K(qi,  ')>  T')- 

3.  Compute  over  the  time  interval  [0,  T,]  the  corresponding  unsafe  set  .f/^fA,.  K(qt,  •),  t ,■). 

4.  Check  if  the  condition  i  C  ■Q/^j(Aj.K(ql.  •),  T,)  holds.  If  this  condition  does  not  hold, 
return  to  step  1  to  modify  the  design  of  K(qj ,  •).  Otherwise,  the  control  design  for  mode  q,  is 
complete. 

5.  Repeat  steps  1-4  for  <7,_i  until  q\.  For  q\,  set  Rq  —  Xq. 

6.  Choose  a  switching  policy  F  according  to  (2.1). 
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It  can  be  verified  using  the  conditions  given  in  Lygeros  et  al.  (1999a)  that  under  the  choice 
of  control  policy  ( F,K )  as  designed  above,  the  automated  sequential  transition  system  has  a 
unique  infinite  execution.  Furthermore,  using  the  definition  of  capture  sets  and  unsafe  sets  as  given 
in  section  2.5,  it  can  be  verified  that  this  execution  satisfies  the  specifications  of  Problem  2.1.  In 
particular,  by  steps  1-4  of  the  design  procedure,  the  feedback  law  K  satisfies 

Ri- 1  c  &q.  (Ri,  K(qh  •) ,  Tf)  fl  srf.{Ai,  K{qu  ■),  Tf ) ,  (2.6) 

for  every  i  —  1 ,  ....in,  with  Rq  =  Xq.  This  ensures  that,  for  each  mode  any  continuous  trajectory 
initialized  from  within  Rj  \  will  reach  Rj  within  T,  time  units  while  avoiding  A,,  regardless  of  the 
realization  of  the  disturbance  d.  Furthermore,  given  the  choice  of  switching  policy  F,  continuous 
state  evolution  in  each  mode  q ;  is  assured  to  be  only  initialized  from  within  R,-\.  The  desired 
properties  then  follow. 

Remark  2.1.  The  control  design  in  step  1  can  be  viewed  as  a  reference  tracking  problem,  for 
which  a  number  of  design  methods  have  been  proposed  in  the  nonlinear  control  literature  (see  for 
example  Sastry,  1999).  In  particular,  one  can  choose  a  point  x  e  Ri  as  a  constant  reference  and 
design  a  controller  in  the  relative  coordinates  x  x  —  x.  However,  the  difficulty  lies  in  the  need  to 
satisfy  a  safety  constraint  on  x,  an  input  constraint  on  u,  possibly  in  the  presence  of  a  disturbance 
d.  In  chapter  3,  we  discuss  a  reachability-based  approach  to  this  problem  in  terms  of  switching 
control  policies. 

Remark  2.2.  The  choice  of  compatibility  condition  (2.6)  is  somewhat  conservative  due  to  the  fact 
there  may  exist  initial  conditions  in  the  unsafe  set  srfqi{Ai,K(qi,  •),  T,)  which  reaches  R,  before  A,-, 
but  is  nonetheless  precluded  and  conservatively  labeled  unsafe.  To  reduce  this  conservatism,  a 
modified  reachability  calculation  combining  target  attainability  and  safety  objectives  can  be  per¬ 
formed,  by  solving  a  constrained  H-J  PDE  (Mitchell,  2002).  This  would  then  replace  the  capture 
sets  and  unsafe  sets  in  the  control  design  procedure.  The  method  given  here  is  chosen  for  simplicity 
of  presentation  and  ease  of  computation. 

2.6.2  Semi-Automated  Sequential  Transition  Systems  with  Reach-avoid 
and  Invariance  Specifications 

Next,  we  present  a  design  procedure  for  Problem  2.2.  In  this  case,  the  reach-avoid  objectives  for 
the  transition  states  q,  can  still  be  satisfied  by  following  a  similar  procedure  as  described  in  the 
preceding  section.  However,  some  additional  design  steps  are  necessary  in  order  to  ensure  that 
the  invariance  objectives  are  met,  and  that  the  stationary  states  are  properly  composed  with  the 
transition  states.  The  precise  sequence  of  steps  is  given  below. 

Let  M’  be  a  semi-automated  sequential  transition  system  M’ ,  such  that  the  vector  field  /  of  M’ 
satisfies  Assumption  2.2  for  each  discrete  state  q  e  Q.  Then  given  target  sets  Rj  C  X,  i  =  1,  ...,m, 
avoid  sets  A,-  Cl,i=l,  ...,m,  and  target  neighborhoods  Wj  C  X  satisfying  Rj  C  W-,  C  Af,  a  control 
policy  (F.  K)  can  be  designed  using  the  following  procedure,  starting  with  mode  qm. 
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1 .  Design  a  continuous  control  law  K(qj,  •)  which  ensures  that  trajectory  initialized  from  within 
Rj  (or  a  subset  thereof)  stays  within  W),  under  the  dynamics  x  =  ficjj.x.  K(qj.x).d). 

2.  Compute  the  maximal  invariant  set  Inv(Wj,K(qj ,  •))  under  this  control  law. 

3.  If  Inv(Wj,K(qi,  •))  DR,  ^  0,  choose  a  target  set  Rj  C  Inv(Wj1K(qi,  •))  D Rj.  Otherwise,  return 
to  step  1  to  modify  the  design  of  K(qj,  •). 

4.  Design  a  continuous  control  law  K(q;,-)  which  regulates  initial  conditions  in  Wj- \  to  the 
target  set  Rj,  under  the  continuous  dynamics  x  =  f(qj,x,K(qj,x),d). 

5.  Compute  the  capture  set  under  this  control  law  to  the  first  time  instant  tj,  such  that  W,_i  C 

&qi{Ri,K{qj,-),Tj). 

6.  Compute  over  the  time  interval  [0,  Tj]  the  corresponding  unsafe  set  £/qi(Aj,K(qj,  •).  Tj). 

7.  Check  if  the  condition  Wj_i  C  ■W^(Al.K(ql,  •),  Tj)  holds.  If  this  condition  does  not  hold, 
return  to  step  4  to  modify  the  design  of  K(qj,  •).  Otherwise,  the  control  design  for  mode  q,  is 
complete. 

8.  Repeat  steps  1-7  for  qj-\  and  <?j-i  until  q\.  For  q  \ ,  set  Wo  =  Xq. 

9.  Choose  a  switching  policy  F  according  to  (2.1),  but  replacing  the  switching  region  Rj  by  Rj. 

It  can  be  verified  using  the  conditions  given  in  Lygeros  et  al.  (1999a)  that  under  the  choice 
of  control  policy  (F,  K)  as  designed  above,  the  semi-automated  sequential  transition  system 
has  a  unique  infinite  execution.  Furthermore,  using  the  definition  of  capture  sets,  unsafe  sets,  and 
invariant  sets  as  given  in  section  2.5,  it  can  be  verified  that  this  execution  satisfies  the  specifications 
of  Problem  2.2.  In  particular,  by  steps  1-3  of  the  design  procedure,  the  feedback  law  K  ensures 
that,  for  each  mode  q{,  trajectories  initialized  from  R,  C  Rj  stays  within  Wj,  for  every  i  =  1,  ...,m 
and  admissible  disturbance  realization.  Furthermore,  by  steps  4-7  of  the  design  procedure,  the 
feedback  law  K  satisfies 

Wi-i  C  @qi{RuK{qu  •),  Tj)  n ^(Aj,K(qj,  •),  Tj),  (2.7) 

for  every  i  =  1 ,  ....m,  with  Wo  =  kj).  This  ensures  that,  for  each  mode  q{,  any  continuous  trajectory 
initialized  from  within  Wj_i  will  reach  Rj  C  Rj  within  T;  time  units  while  avoiding  Aj,  regardless 
of  the  realization  of  the  disturbance  d.  Given  the  choice  of  switching  policy  F,  continuous  state 
evolution  in  each  mode  qj  is  assured  to  be  only  initialized  from  within  Rj.  On  the  other  hand,  due 
to  the  invariance  property  of  K  in  the  stationary  modes,  continuous  state  evolution  in  each  mode  qt 
is  assured  to  be  only  initialized  from  within  Wj_i.  The  desired  properties  then  follow. 

Remark  2.3.  For  certain  specifications  of  the  target  set  Rj,  target  neighborhood  Wj,  continuous 
dynamics  /,  and  input  bounds  U  and  D,  there  may  not  exist  a  subset  Rj  of  Rj  such  that  every  tra¬ 
jectory  initiated  from  Rj  remains  inside  Wj  at  all  times.  To  check  the  feasibility  of  an  invariance 
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objective,  one  may  consider  performing  an  invariant  set  calculation  using  a  differential  game  for¬ 
mulation  of  the  problem  as  described  in  Tomlin  et  al.  (2000),  and  verify  the  condition  given  in  step 
3.  In  the  case  that  an  invariance  objective  is  found  to  be  infeasible,  one  may  consider  modifying 
the  specification  of  the  target  set  Rj  or  the  target  neighborhood  IT,  . 

Remark  2.4.  The  control  design  in  step  1  can  be  viewed  as  a  stabilization  problem,  by  choosing 
some  point  x  £  R,  and  designing  a  stabilizing  controller  in  the  relative  coordinate  system  x:—x—x. 
For  nonlinear  systems,  this  design  can  be  performed  for  example  using  Lyapunov-based  techniques 
(Sastry,  1999).  However,  the  difficulty  again  lies  in  finding  control  designs  which  satisfies  the  state 
constraint  W,  and  the  input  constraint  U,  while  accounting  for  continuous  disturbances.  In  chapter 
3,  a  reachability-based  approach  to  this  problem  will  be  discussed. 

2.7  Recovery  from  Improper  Initialization 

The  design  procedures  described  in  the  preceding  section  provides  assurances  that  under  operating 
conditions  which  satisfies  the  assumptions  of  the  system  model,  the  desired  specifications  will 
be  achieved.  However,  given  the  myriad  of  contingency  scenarios  which  can  arise  during  actual 
system  operation,  a  system  designer  also  needs  to  account  for  run-time  fault  conditions  which 
causes  the  assumptions  of  the  system  model  to  be  violated.  For  certain  classes  of  faults,  appropriate 
design  choices  can  be  made  to  enable  recovery  from  the  fault  condition  in  an  automatic  fashion, 
for  example  through  built-in  redundancies.  Nonetheless,  due  to  the  fact  that  not  every  contingency 
scenario  can  be  anticipated  at  design  time,  some  level  of  human  supervision  may  be  inevitable  in 
the  event  of  a  fault  condition. 

In  this  section,  we  will  discuss  a  possible  use  for  reachable  sets  as  a  visual  aid  to  guide  human 
decision-making  in  the  case  of  improper  system  initialization.  Specifically,  this  is  a  scenario  in 
which  the  state  of  the  sequential  transition  system  is  initialized  outside  of  the  designated  set  Init, 
namely  (q o,*o)  ^  Init.  Within  the  context  of  the  aerial  refueling  example,  this  corresponds  to  a 
scenario  in  which  the  refueling  sequence  is  initiated  from  a  position  outside  the  first  waypoint  set 
Xq.  Such  a  scenario  could  arise  for  example  due  to  operator  mistakes,  miscommunication  between 
the  UAV  operator  and  the  tanker  pilot,  or  complex  missions  with  multiple  aircraft  operating  in 
proximity  of  each  other.  Using  reachable  sets  as  visual  guidance  can  be  helpful  for  motion  planning 
applications  in  which  the  sets  are  computed  in  the  planning  space  and  hence  provides  the  operator 
with  a  sense  of  the  reachable  space  of  the  underlying  continuous  system. 

To  address  this  fault  condition,  the  system  designer  can  consider  adding  a  finite  number  of 
general  purpose  recovery  modes  {q\ ,  qi, ...,  <?m},  corresponding  to  dynamics  x  —  f(qi,x , «,  d),  with 
choices  of  feedback  laws  K{q^  •).  In  the  case  of  AAR,  this  could  for  example  be  a  set  of  escape 
maneuvers.  The  problem  of  recovering  from  the  fault  condition  then  becomes  one  of  constructing  a 
recovery  sequence  from  the  library  of  recovery  modes  at  run-time,  in  order  to  drive  the  continuous 
state  of  the  system  into  the  feasible  set  of  a  mode  q  £  Q  of  the  original  sequential  transition  system. 
This  feasible  set  can  be  for  example  derived  from  the  compatibility  condition  given  in  (2.6). 

First,  for  each  recovery  mode  qi,  an  unsafe  set  computation  can  be  performed  at  design  time  to 
determine  the  set  of  unsafe  initial  conditions  •),  f(),  over  some  appropriate  choice  of 
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time  horizon  f The  time  horizon  should  be  long  enough  so  that  the  unsafe  set  does  not  provide 
misleading  information  to  the  human  operator,  but  also  not  so  long  that  the  resulting  decisions  are 
rendered  excessively  conservative.  Thus,  appropriate  choices  of  time  horizons  should  be  tailored 
to  the  particular  application. 

At  run-time,  a  human  operator  can  consult  these  sets  to  determine  the  choice  of  recovery  modes 
that  can  be  safely  initiated.  In  particular,  as  long  as  the  system  is  initialized  at  a  state  outside  the 
intersection  of  unsafe  sets  over  all  recovery  modes,  namely  xq  f\/(  ■^q,{A-K{qi.i  •),  f;),  at  least  one 
safe  recovery  mode  <37  is  available.  From  the  set  of  safe  recovery  modes,  a  particular  mode  can  be 
then  selected  so  as  to  make  progress  towards  the  feasible  set  of  a  transition  state  or  stationary  state 
in  the  sequential  transition  system.  During  the  execution  of  this  maneuver  over  time  interval  [0,  f;], 
the  operator  can  consult  the  computed  unsafe  sets  and  plan  the  next  recovery  mode  qj  in  the  fault 
recovery  sequence.  At  a  time  when  it  is  safe  to  perform  maneuver  qj,  a  command  can  be  issued 
to  transition,  and  the  procedure  repeats  until  the  system  state  recovers  in  a  mode  of  the  sequential 
transition  system  (not  necessarily  q\). 

2.8  Aerial  Refueling  Example 

This  section  describes  an  application  of  the  controller  design  methodology  developed  in  this  chap¬ 
ter  to  the  specific  example  of  Automated  Aerial  Refueling  (AAR).  The  discussion  will  primarily 
focus  on  the  case  in  which  AAR  is  to  be  performed  autonomously  without  human  supervision 
(Problem  2.1)  in  order  to  illustrate  the  basic  mechanics  of  the  design  procedures  introduced  in  sec¬ 
tion  2.6.  However,  later  on  in  the  section,  we  will  also  briefly  touch  on  the  extension  to  invariance 
objectives,  the  use  of  reachable  sets  for  fault  recovery,  and  the  effects  of  disturbances. 

2.8.1  Overview  of  Automated  Aerial  Refueling  (AAR)  Process 

In  an  aerial  refueling  process,  a  formation  of  unmanned  aerial  vehicles  (UAVs)  approaches  a  human 
piloted  tanker  aircraft.  One  by  one,  the  UAVs  perform  a  sequence  of  maneuvers  to  dock  with 
a  human  operated  fuel  boom  and  then  return  to  formation.  A  graphical  top  down  view  of  the 
refueling  process  is  shown  in  Fig.  2.2. 

The  tanker  aircraft  is  shown  in  the  center,  with  the  refueling  UAV  flying  in  formation  to  be 
refueled.  In  the  actual  refueling  process,  the  UAV  typically  approaches  from  a  fixed  position  in 
the  formation.  For  modeling  purposes,  the  aircraft  to  be  refueled  is  assumed  to  approach  from 
a  position  behind  and  to  the  right  of  the  tanker  aircraft.  From  this  position,  the  UAV  initiates  a 
sequence  of  maneuvers  through  the  numbered  waypoints,  under  a  combination  of  human  operator 
commands  and  autonomous  decisions.  The  sequence  of  maneuvers  in  the  AAR  process  are  shown 
in  Table  2.1.  The  model  described  here  utilize  the  separation  of  waypoints  found  in  the  work  of 
Ross  et  al.  (2006). 
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Figure  2.2:  Diagram  of  waypoint  locations  in  aerial  refueling  process,  as  labeled  1  through  6.  Each 
flight  maneuver  corresponds  to  a  transition  between  waypoints. 


Event 

Maneuver 

Man.# 

Description 

012 

Detach  1 

1 

a  single  UAV  detaches  from  a  formation  of  UAVs  in  flight 
to  a  position  slightly  behind  and  to  the  right  of  a  tanker 
aircraft. 

023 

Precontact 

2 

the  UAV  banks  left  towards  a  position  directly  behind  the 
tanker  aircraft. 

034 

Contact 

3 

the  UAV  approaches  the  tanker  aircraft  from  behind  to  al¬ 
low  the  boom  operator  on  board  the  tanker  to  lower  the  fuel 
boom  and  catch  the  UAV. 

045 

Postcontact 

4 

the  UAV  slows  down  and  moves  away  from  the  tanker  air¬ 
craft  after  the  boom  operator  detaches  the  fuel  boom. 

056 

Detach  2 

5 

the  UAV  banks  right  towards  a  position  directly  behind  the 
UAV  formation. 

067 

Rejoin 

6 

the  UAV  speeds  up  and  rejoins  the  formation  to  complete 
the  refuel  sequence. 

Table  2.1:  Descriptions  of  flight  maneuvers  in  the  aerial  refueling  process. 
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2.8.2  Aircraft  Model 


Under  the  assumption  that  refueling  occurs  one  vehicle  at  a  time,  we  will  focus  our  attention  on  the 
interaction  between  a  single  UAV  and  the  tanker  aircraft.  The  kinematics  model  as  described  here 
for  the  relative  dynamics  between  the  two  aircraft  leverages  previous  work  Tomlin  et  al.  (2001) 
in  the  modeling  of  aircraft  conflict  resolution  scenarios  in  air  traffic  management.  The  model 
assumes  that  the  two  aircraft  do  not  change  altitude  significantly  in  performing  the  aerial  refueling 
maneuvers,  and  this  is  justified  in  the  state  of  the  practice  for  human-piloted  maneuvers  of  this  kind. 
In  fact,  using  a  change  in  altitude  might  jeopardize  the  success  of  the  mission,  as  a  boom  operator 
might  suspend  the  mission  if  loss  of  line  of  sight  occurs;  thus,  there  is  motivation  to  preserve  a 
2D  solution.  Recent  work  by  Williamson  et  al.  (2009)  provides  promise  that  autonomous  vehicles 
will  be  capable  of  sufficiently  accurate  onboard  sensing  to  utilize  the  selected  coordinate  system. 
Placing  the  two  aircraft  in  a  2D  plane,  the  relative  motion  of  the  two  aircraft  in  the  UAV  reference 
frame  can  be  modeled  as: 


x  =  f(x,  u,d) 


Xi 

—  Ml  +  d\  COS.V3  +  U2X2 

x2 

= 

d\  sin^3  —  U2X\ 

.  *3  . 

-U2 

(2.8) 


where  x\}X2-,x?,  are  the  longitudinal,  lateral,  and  heading  coordinates  of  the  tanker  aircraft  in  the 
UAV  reference  frame,  u  \ .  ui  are  the  translational  and  angular  velocities  of  the  UAV  as  indicated 
in  Fig.  2.3,  and  d\  is  the  translational  velocity  of  the  tanker  aircraft.  Here  it  is  assumed  that  the 
tanker  is  in  straight  and  level  flight,  and  hence  its  angular  velocity  is  set  to  zero.  For  most  of 
our  reachability  and  simulation  results,  it  is  assumed  that  the  tanker  aircraft  maintains  a  nominal 
forward  velocity  vo-  However,  as  discussed  in  section  2.5,  the  reachability  computation  can  be 
modified  in  a  straightforward  manner  to  account  for  fluctuations  in  the  tanker  velocity  within  a 
bounded  range,  and  this  case  is  covered  in  section  2.8.10,  which  demonstrates  the  corresponding 
changes  to  the  reachable  set  calculations. 

With  regards  to  parameter  values,  the  nominal  velocity  of  the  tanker  aircraft  is  chosen  to  be 
v0  =  84.8  m/s  (75%  of  the  maximum  allowable  velocity  of  the  UAV);  the  velocity  input  u\  for  the 
UAV  has  the  saturation  limits  [40, 113]  m/s,  and  the  angular  velocity  input  U2  has  the  saturation 
limits  [— tt/6,  7t/6]  s-1.  The  maximum  UAV  velocity  value  is  based  on  published  specifications 
for  the  MQ-9  Predator  B;  other  values  are  chosen  based  on  realistic  constraints. 

For  completeness,  it  should  be  noted  that  the  relative  coordinates  in  the  UAV  reference  frame 
and  the  tanker  reference  frame  are  related  by  a  nonlinear  coordinate  transformation.  Specifi¬ 
cally,  suppose  .v  =  (x | .  A' 2 •  A' 3 )  is  the  coordinates  of  the  tanker  in  the  UAV  reference  frame,  and 
x  —  (x  |  .-U-U)  is  the  coordinates  of  the  UAV  in  the  tanker  reference  frame,  then  x  —  p(x),  where 
p  :  M3  — >  M3  is  given  by 


( 

Xi 

\ 

— ^lCOS^3  —  X2  sin^3 

p 

x2 

= 

X\  sin.V3  —  *2  COS .*3 

V 

.  *3  . 

) 

x3 
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This  transformation  will  be  used  in  transforming  target  sets  and  avoid  sets  specified  in  tanker 
coordinates  into  UAV  coordinates.  Specifically,  suppose  a  set  S  is  represented  by  a  function  <j)  in 
the  tanker  reference  frame  (namely  </>  (jc)  <  0,  Vx  G  S),  then  the  corresponding  set  S  in  the  UAV 
reference  frame  is  represented  by  the  function  (j)  =  (j)  o  p . 


Figure  2.3:  Relative-coordinate  system,  kinematic  model.  The  origin  of  the  coordinate  system  is 
centered  on  the  UAV. 


2.8.3  Hybrid  System  Abstraction  of  AAR 

The  sequence  of  flight  maneuvers  as  described  in  Table  2.1,  along  with  the  kinematics  model 
of  aircraft  dynamics  as  given  in  (2.8)  provides  us  with  an  abstraction  of  aerial  refueling  process 
in  terms  of  a  sequential  transition  system,  as  shown  in  Fig.  2.4.  In  this  model,  the  transition 
states  consists  of  the  sequence  of  flight  maneuvers  as  listed  in  Table  2.1,  while  the  stationary 
states  consists  of  intermediate  maneuvers  in  which  the  UAV  is  to  wait  in  a  neighborhood  of  each 
waypoint  while  waiting  for  operator  command.  In  addition,  there  are  four  general  purpose  escape 
maneuvers,  labeled  qi  to  <74  to  handle  the  case  of  improper  initialization  as  discussed  in  section  2.7. 
The  continuous  dynamics  within  each  flight  maneuver  is  identical  and  given  by  (2.8).  The  various 
maneuvers  differ  only  by  the  choice  of  continuous  control  laws  K(qi ,  •),  K(qi ,  •),  and  K(qi ,  •), 
corresponding  to  transition  maneuvers,  stationary  maneuvers,  and  escape  maneuvers,  respectively. 

We  first  consider  the  case  in  which  the  flight  maneuvers  are  to  be  performed  autonomously.  By 
choosing  a  neighborhood  of  states  around  each  waypoint  in  Fig.  2.2  as  a  target  set,  and  by  choosing 
a  protected  zone  around  the  tanker  aircraft  to  be  an  avoid  set,  the  task  of  designing  AAR  can  be 
formulated  as  an  instance  of  Problem  2.1.  As  described  in  section  2.4  and  section  2.6,  through  an 
appropriate  choice  of  switching  policy,  the  design  parameters  become  the  continuous  control  laws 
within  the  respective  flight  maneuvers.  In  the  following,  we  will  discuss  the  specification  of  the 
target  sets  and  avoid  sets,  as  well  as  the  form  of  the  continuous  control  laws. 

2.8.4  Specification  of  Target  Sets  and  Avoid  Sets 

The  target  set  R,  for  each  maneuver  <7/  is  chosen  to  be  a  disc  shaped  neighborhood  around  each 
desired  waypoint  (see  Fig.  2.5),  with  bounds  on  the  relative  heading  error.  This  choice  is  consistent 
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Figure  2.4:  Discrete  states  and  transitions  in  hybrid  system  abstraction  of  AAR  process. 


with  the  objective  of  controlling  the  aircraft  to  within  some  Euclidean  distance  of  a  given  waypoint. 
For  waypoint  i,  this  set  can  be  specified  in  tanker  coordinates  as  Rj  =  B([— x\f(qi),  — *2/ (<?/)] •  ro)  x 
[— A0,A0]  where  R  (.vo,  r)  denotes  a  ball  of  radius  r,  centered  at  xq  i  n  M2 .  In  this  case,  the  radius  and 
heading  tolerance  are  chosen  to  be  ro  =  4  m  and  A0  =  7T/16  rad,  respectively.  The  corresponding 
set  in  the  UAV  coordinate  frame  is  obtained  from  the  transformation  p  in  (2.9). 


Figure  2.5:  Target  sets  and  avoid  sets  for  transition  maneuvers  in  AAR  process. 


Each  flight  maneuver  uses  an  identical  avoid  set  A,  namely  the  set  of  continuous  states  cor¬ 
responding  to  minimum  separation  infringement  (MSI)  violation  between  the  tanker  aircraft  and 
UAV.  This  set  consists  of  a  disc  in  the  X1-X2  plane,  with  a  small  neighborhood  of  states  around 
the  fuel  boom  removed  to  allow  approach  by  the  UAV.  In  the  tanker  reference  frame,  this  is 
given  by  A  —  (£([15,  0],ao)  x  [— k,  7t])\  V,\/qi  G  Q,  where  ciq  —  30m  is  the  protected  radius 
(chosen  based  upon  published  data  of  the  wingspan  of  a  Boeing  KC-135  Stratotanker),  the  ori¬ 
gin  of  the  tanker’s  coordinate  system  is  15  m  from  the  centroid  of  the  tanker,  and  V  is  a  small 
hyper-rectangle  of  states  around  the  boom  location,  defined  in  the  tanker  reference  frame  as  V  = 
{x6l3:  —15m  <  x\  <  10m,  —8m  <%2  <  8m,  —n  <  *3  <  7r}.  The  corresponding  avoid  set  A  in 
the  UAV  coordinate  frame  can  be  obtained  from  the  coordinate  transformation  p. 

2.8.5  Structure  of  Continuous  Controllers 

The  feedback  control  laws  to  perform  the  various  maneuvers  are  applied  through  the  inputs  u\  and 
U2-  To  emulate  high-level  waypoint  following  algorithms,  proportional  control  laws  are  used  to 
steer  the  UAV  to  the  various  desired  waypoints.  For  transition  maneuvers  q\  to  <75,  the  feedback 
laws  K{qi ,  •)  are  of  the  form 


mi  =  k\{x\  —x\f)  +  vo  (2.10) 

u2  =  k2(x2-X2f )  (2.11) 

where  k\  and  kj  are  proportional  gain  constants,  and  x\ /,  X2 /  are  the  desired  waypoint  locations  in 
the  UAV  reference  frame.  To  take  into  account  actuator  limitations,  the  control  laws  are  saturated 
to  within  the  input  ranges  given  in  section  2.8.2.  The  control  law  K(qi ,•)  for  each  stationary 
maneuver  q,,  i  =  1, ...,  6  is  chosen  to  be  identical  as  that  of  the  preceding  transition  maneuver  <7/. 

The  waypoint  locations  for  the  transition  and  stationary  maneuvers  are  specified  in  Table  2.8.5. 
During  the  control  design  procedure,  the  proportional  gain  constants  will  be  selected  so  as  to  ensure 
the  reachability  objective  of  each  flight  maneuver. 


Maneuver 

Mode  Label 

*1/ 

X2f 

Detach  1,  Stationary  1 

<7l, <7l 

25.5 

33.5 

Precontact,  Stationary  2 

<72,  <72 

25.5 

0 

Contact,  Stationary  3 

<73,  <73 

8.0 

0 

Postcontact,  Stationary  4 

<74,  <74 

25.5 

0 

Detach  2,  Stationary  5 

<75,  <75 

25.5 

33.5 

Rejoin,  Stationary  6 

%,  Q6 

4.5 

33.5 

Table  2.2:  Desired  waypoint  locations  for  continuous  control  laws  (xif:X2f,  in  meters). 

Finally,  the  control  laws  K(qi:-)  for  the  four  escape  maneuvers  q,,  i  =  1 , 4  are  chosen  as 
follows: 
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1.  Escape  1  (steer  left  at  max  speed):  u\  =  wimax,  112  =  «2max; 

2.  Escape  2  (steer  right  at  max  speed):  u\  =  wimax,  M2  =  — M2max; 

3.  Escape  3  (slow  down):  u\  =  Mimin,  112  —  0; 

4.  Escape  4  (speed  up):  u\  —  Mimax,  m  —  0. 

2.8.6  Control  Design  Using  Capture  Sets  and  Collision  Sets 

We  now  describe  the  design  the  maneuver  control  laws,  using  the  procedures  given  in  section  2.6.1. 
For  the  rest  of  our  discussions  in  this  section,  unsafe  sets  will  be  referred  to  interchangeably  as 
collision  sets ,  due  to  the  fact  that  these  sets  correspond  to  initial  conditions  that  could  result  in  a 
collision  with  the  tanker  aircraft. 

For  a  given  flight  maneuver  cp,  we  first  fix  a  set  of  control  gains  and  compute  the  capture  set 
with  respect  to  a  target  set  Rj  until  a  time  instant  T;  such  that  Rt  _  1  C  ^%qj(Ri,K(qi:  •).  T,j.  An  unsafe 
set  computation  is  then  performed  to  check  the  safety  condition  Rr\  C  .s?C(A,K{qi,-),Ti).  For 
mode  qi,  the  set  Rq  is  specified  to  be  the  set  of  permissible  initial  states  Xo,  as  shown  in  Fig.  2.5. 
The  control  gains  for  each  flight  maneuver  are  then  adjusted  as  necessary  to  ensure  the  target 
attainability  and  safety  objectives  are  met.  The  set  of  control  gains  and  maneuver  timings  obtained 
from  this  design  procedure  is  summarized  in  Table  2.8.6. 


Maneuver 

k\ 

k2 

Time  T,  (s) 

Elapsed  time  (s) 

Detach  1 

3 

1 

1.25 

1.25 

Precontact 

0.5 

5 

3.00 

4.25 

Contact 

2.5 

1 

1.00 

5.25 

Postcontact 

2.5 

1 

1.00 

6.25 

Detach  2 

1 

5 

3.50 

9.75 

Rejoin 

3 

1 

1.25 

11.0 

Table  2.3:  Proportional  gain  constants  (k\ ,  kn)  and  timings  (r )  in  seconds)  for  transition  maneuvers. 

Example  capture  sets  are  shown  in  Fig.  2.6a  and  Fig.  2.6b  for  the  Contact  (<73)  and  Rejoin  (q^) 
maneuvers.  It  can  be  seen  that  the  control  laws  for  the  two  maneuvers  are  designed  so  as  to  ensure 
that  the  target  set  of  the  preceding  maneuver  is  contained  within  the  capture  set  of  the  current 
maneuver  and  has  empty  intersection  with  the  collision  set  of  the  current  maneuver,  thus  ensuring 
proper  composition  between  continuous  trajectories  of  successive  flight  maneuvers  in  the  refueling 
sequence. 

2.8.7  Refueling  Sequence  Simulation 

A  complete  simulation  of  the  refueling  sequence  is  constructed  to  check  the  satisfaction  of  the 
safety  and  target  attainability  objectives.  In  this  simulation,  the  UAV  does  not  spend  any  time  in 
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Precontact  Rejoin 


Xj  (longitudinal) 


(a)  Capture  (light,  green)  and  collision  (dark,  red)  sets  for 
Precontact. 


(b)  Capture  (light,  green)  and  collision  (dark,  red)  sets  for 
Rejoin. 


(c)  Slice  of  Precontact  capture  (dashed)  and  col¬ 
lision  (dotted)  sets  at  A3  =  0 


(d)  Slice  of  Rejoin  capture  (dashed)  and  collision 
(dotted)  sets  at  xj,  =  0 


Figure  2.6:  Capture  sets  and  collision  sets  for  Precontact  and  Rejoin  maneuvers.  In  each  figure, 
x\  and  V 2  represent  longitudinal  and  lateral  offset  (respectively),  and  A3  represents  the  offset  in 
heading  between  the  UAV  and  tanker. 
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the  stationary  modes,  namely  a  forced  transition  is  taken  to  the  next  maneuver  in  the  sequence  as 
the  state  of  the  UAV  enters  a  target  set. 

Some  snapshots  of  the  simulation  are  shown  in  Fig.  2.7,  where  the  capture  sets  and  collision  sets 
for  each  flight  maneuver  are  superimposed  on  the  trajectory  of  the  UAV.  As  guaranteed  by  the  mode 
switching  conditions,  each  maneuver  is  completed  within  the  transition  timing  given  in  Table  2.8.6, 
without  entering  the  avoid  set  A  corresponding  to  MSI.  Furthermore,  it  is  verified  that  whenever  the 
system  state  .v  enters  a  target  set  Rj  in  mode  qt,  the  conditions  ^  G  &qi+1  K(qi+i,  •),  T;+i)  and 
x  &  ^ ?,-+i  (A,K(qi+ 1,  •),  Tj+i)  are  satisfied,  thus  ensuring  the  feasibility  of  the  next  flight  maneuver. 

2.8.8  Extension  to  Invariance  Objectives 

Here  we  consider  an  extension  of  the  controller  design  to  the  case  in  which  the  specifications 
requires  the  UAV  to  remain  in  a  neighborhood  of  certain  waypoints  while  waiting  for  operator 
commands  to  proceed  to  the  next  flight  maneuver.  This  falls  within  the  framework  of  a  semi- 
automonomous  sequential  transition  system.  In  particular,  we  will  focus  on  the  controller  design 
for  Stationary  3  (<73),  corresponding  to  when  the  UAV  is  expected  to  be  refueling.  In  this  case,  it 
is  necessary  that  the  UAV  maintains  itself  within  a  neighborhood  of  the  fueling  boom  while  the 
boom  operator  performs  the  refueling  operation. 

We  specify  a  target  neighborhood  for  this  stationary  maneuver  in  tanker  coordinates  as  W3  = 
B([— x\f(q3),  —X2f{q'i)],r\ )  x  [— A0,A0],  where  the  waypoint  location  (jci/,jc2/)  is  as  given  in  Ta¬ 
ble  2.8.5  for  the  Contact  maneuver,  the  neighborhood  radius  is  set  to  r\  =  6  m.  and  the  heading 
tolerance  is  set  to  A0  —  k/  16  rad.  The  controller  for  this  maneuver  is  chosen  to  be  the  same  as  that 
designed  for  the  contact  maneuver  and  an  invariant  set  calculation  is  performed  according  to  the 
procedure  described  in  section  2.5.3.  The  result  is  shown  in  Fig.  2.8.  In  these  plots,  the  invariant  set 
satisfies  Inv(W3,K(q3 ,•))  C  &q4{R4,K(q4,  •),  t4)  and  Inv(W3, K(q3,  ■))  fl  £?q4{A,K(q4,  •),  t4)  =  0, 
namely  it  lies  within  the  feasible  set  of  the  next  maneuver  Postcontact  (<74)  in  the  refueling  se¬ 
quence.  In  order  to  ensure  that  the  contact  maneuver  ends  in  a  state  satisfying  the  invariance 
objective,  the  target  set  for  the  contact  maneuver  can  be  chosen  according  to  the  procedures  of 
section  2.6.2  as  R3  =  B([—xif(q3),  —  X2f(q3)] ,/'o)  x  [ — tt/18, tt/18],  where  ro  is  as  given  in  sec¬ 
tion  2.8.4. 

2.8.9  Scenario  of  Improper  Initialization 

In  this  section,  we  formulate  a  simulation  scenario  in  which  the  system  state  is  initialized  outside 
the  set  Xo.  This  provides  an  example  of  improper  initialization  as  discussed  in  section  2.6.2  and 
will  be  used  to  illustrate  the  use  of  reachable  sets  as  a  tool  for  guiding  human  decision  making. 

The  goal  in  this  case  is  to  construct  a  sequence  of  escape  maneuvers  to  arrive  at  the  target 
set  R2  of  the  Precontact  (<72)  maneuver,  using  the  collision  sets  f ))  computed  for  escape 

maneuvers  1-4,  as  well  as  the  capture  and  collision  sets  for  the  Precontact  maneuver.  In  practice, 
this  task  would  be  carried  out  by  a  trained  UAV  operator.  For  this  simulation  scenario,  however, 
the  maneuver  selection  is  performed  by  heuristic  examination  of  the  generated  sets.  The  results 
are  shown  in  Fig.  2.9. 
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Lateral  Offset  (m)  Lateral  Offset  (m)  Lateral  Offset  (m) 


Simulation  Time  =  0.500000  (s) 


Simulation  Time  =  1 .500000  (s) 


(a)  Detach  1 


(b)  Precontact 


Simulation  Time  =  4.500000  (s) 


Simulation  Time  =  5.500000  (s) 


(c)  Contact 


(d)  Postcontact 


Simulation  Time  =  7.000000  (s) 


Simulation  Time  =  1 0.000000  (s) 


(e)  Detach  2 


(f)  Rejoin 


Figure  2.7:  Refueling  sequence  simulation  with  capture  sets  (dashed  lines),  avoid  and  collision 
sets  (dotted  lines). 
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(a)  Postcontact  capture  set  (light,  green)  and  Stationary 
3  invariant  set  (dark,  blue). 


Postcontact  (unsafe  set) 


-3 

c 

■; 

S 

•S 


(b)  Postcontact  collision  set  (light,  red)  and  Stationary 
3  invariant  set  (dark,  blue). 


(longitudina 


Figure  2.8:  Results  of  an  invariant  set  calculation  for  Stationary  3  maneuver,  showing  that  the 
Postcontact  maneuver  can  be  safely  initiated  following  refueling. 


From  the  first  plot,  the  UAV  is  initialized  at  a  location  outside  the  capture  set  of  the  Precontact 
maneuver  (xq  ^  Mcn{R2-K(q2-  •),  T2)).  In  fact,  this  initial  condition  lies  inside  the  collision  set  of 
the  Precontact  maneuver  (jco  €  £^qi (A.K(q2,  •),  12))-  In  examining  the  collision  sets  computed  for 
the  escape  maneuvers,  it  is  found  that  xo  ^  ,0^4(/\.  K(q^,  •),  Z4).  The  recovery  maneuver  Escape  4 
(c/4)  corresponding  to  “speed  up”  is  then  selected  as  a  safe  flight  maneuver.  After  performing  this 
maneuver  for  some  time,  while  consulting  the  collision  sets,  it  is  found  that  both  Escape  2  (c/2) 
and  Escape  1  (q\  )  become  available,  corresponding  to  “turn  right”  and  “turn  left”,  respectively. 
Maneuver  Escape  2  is  chosen  first,  followed  by  maneuver  Escape  1  to  return  the  heading  to  that 
of  the  tanker  vehicle.  Finally,  maneuver  Escape  3  is  selected,  corresponding  to  “slow  down.” 
While  reducing  speed,  the  state  of  the  UAV  enters  the  capture  set  of  the  Precontact  maneuver 
(x  e  &f2(R2,K2,  T2)),  and  the  UAV  mode  transitions  to  the  Precontact  maneuver,  and  the  fault 
recovery  sequence  completes. 

2.8.10  Effects  of  Disturbance  on  Reachable  Set  Computation 

In  the  previous  results,  capture  sets  and  collision  sets  are  generated  assuming  a  nominal  tanker  ve¬ 
locity  of  vo  =  84.8m/.v.  However,  during  execution  time,  there  may  be  some  degree  of  uncertainty 
associated  with  the  velocity  of  the  tanker,  due  to  unmodeled  dynamics  and  various  environment 
disturbances  (for  example  wind  effects).  This  uncertainty  may  not  be  significant  for  maneuvers 
far  enough  from  the  tanker  aircraft.  However,  for  the  Contact  maneuver  in  which  the  UAV  needs 
to  come  within  close  proximity  of  the  tanker  aircraft,  even  slight  variations  in  the  tanker  aircraft 
speeds  may  compromise  the  safety  of  the  maneuver. 

As  discussed  in  section  2.5,  the  Hamilton-Jacobi  method  for  reachability  analysis  offers  the 
flexibility  to  account  for  this  uncertainty  in  the  tanker  aircraft  velocity.  In  this  case,  the  tanker 


39 


Simulation  Time  =  0.000000  (s),  (mode=Escape  4  (speed  up)) 


Simulation  Time  =  1 .000000  (s),  (mode=Escape  2  (steer  right . 


—  Target  Set  for  Precontact 

-  Capture  Set  for  Precontact 

■  Avoid  Set 

•  Collision  Set  for  Precontact 

■  •  Collision  Set  for  Escape  4  (sp 


5  0- 

1 


O 


-H- 


Distance  Traveled  (m) 


(a)  Escape  Mode  4  (Speed  Up)  initiated  at  t  =  0s 


(b)  Escape  Mode  2  (Steer  Right  at  Max  Speed) 
was  initiated  at  t  =  0.5s,  shown  here  at  t  =  1 .0s 


Simulation  Time  =  1 .500000  (s),  (mode=Escape  1  (steer  left  at  max  speed)) 


- Target  Set  for  Precontact 

- Capture  Set  for  Precontact 

Avoid  Set 

- Collision  Set  for  Escape  1  (steer  left  at  max  speed) 

50 


50  100  150  200  250 

Distance  Traveled  (m) 

(c)  Escape  Mode  1  (Steer  Left  at  Max  Speed) 
initiated  just  before  t  =  1 .25s 


Simulation  Time  =  3.000000  (s),  (mode=Escape  3  (slow  down)) 


(d)  Escape  Mode  3  (Slow  down),  shown  at  t  =  3s 


Simulation  Time  =  5.000000  (s),  (mode=Precontact)  Simulation  Time  =  8.000000  (s),  (mode=Precontact) 


(e)  Performing  Precontact,  shown  here  at  t  =  5s  (f)  Precontact  completed,  shown  here  at  t  =  8s 

Figure  2.9:  Fault  recovery  sequence  simulation  with  capture  set  for  Precontact  (dashed  lines),  and 
collision  sets  for  Precontact  (dotted  lines)  and  escape  maneuvers  (dash-dotted  lines). 
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velocity  d\  is  allowed  to  fluctuate  in  the  bounded  range  [79.14,90.45]  m/s  (70-80%  of  the  maxi¬ 
mum  allowable  velocity  of  the  UAV).  The  capture  set  and  collision  set  for  the  Contact  maneuver 
under  the  effects  of  this  disturbance  are  shown  in  Fig.  2.10  (a)  and  (b),  along  with  the  same  sets 
calculated  under  the  nominal  tanker  velocity. 


Contact  with  and  without  uncertainty. 


Contact  (collision  set)  with  and  without  uncertainty. 


(a)  Contact  maneuver  capture  set  without  uncertainty 
(outer  line),  with  uncertainty  (inner  line). 


(b)  Contact  maneuver  collision  set  with  uncertainty 
(outer  line),  collision  set  without  uncertainty  (middle 
line),  and  the  MSI  (innermost  line). 


Figure  2.10:  Capture  set  and  collision  set  for  Contact  maneuver  under  worst-case  tanker  speed. 


As  expected,  the  capture  set  with  added  uncertainty  is  smaller  than  that  without  uncertainty, 
shown  in  Fig.  2.10a.  This  is  due  to  the  fact  that  under  worst  case  tanker  aircraft  speed  input, 
the  tanker  is  effectively  trying  to  prevent  the  UAV  from  entering  the  refueling  zone.  Similarly, 
the  worst  case  collision  set  under  uncertainty,  shown  in  Fig.  2.10b,  is  larger  than  that  without 
uncertainty.  This  results  from  the  worst  case  tanker  speed  input  which  effectively  tries  to  force  a 
collision  with  the  UAV. 
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Chapter  3 

Controller  Synthesis  Algorithms  for  Safety 
and  Reach-avoid  Problems 

3.1  Overview  and  Related  Work 

This  chapter  presents  several  computational  algorithms  for  the  synthesis  of  feedback  control  poli¬ 
cies  to  satisfy  safety  and  reach-avoid  objectives  for  switched  nonlinear  systems.  In  particular, 
we  consider  an  abstraction  for  high  level  control  of  a  physical  system  as  described  in  terms  of  a 
discrete  decision  process  selecting  amongst  a  finite  set  of  continuous  behaviors  (e.g.  maneuvers 
of  an  aircraft,  motions  of  a  robot,  gears  of  an  automobile),  where  the  continuous  behaviors  are 
characterized  by  a  nonlinear  vector  field,  up  to  some  bounded  continuous  disturbances.  Using  this 
abstraction,  controller  synthesis  techniques  are  formulated  to  satisfy  two  types  of  specifications: 

•  safety :  keep  the  system  state  within  a  prescribed  safe  set  in  the  hybrid  state  space  over  finite 
or  infinite  time  horizon; 

•  reach-avoid :  drive  the  system  state  into  a  prescribed  target  set  in  the  hybrid  state  space 
within  finite  time,  subject  to  a  constraint  that  the  state  trajectory  avoids  an  unsafe  set. 

A  control  policy  resulting  from  the  controller  synthesis  algorithms  consists  of  a  set  of  feasible 
initial  conditions,  as  well  as  a  set-valued  feedback  law  defined  on  the  feasible  set. 

It  is  important  to  note  that  the  switched  system  model  employed  in  this  chapter  has  a  very 
different  interpretation  from  the  switched  system  model  of  the  preceding  chapter.  Whereas  the 
discrete  states  of  a  sequential  transition  system  is  used  to  represent  the  phases  of  a  dynamic  process, 
the  discrete  states  here  are  used  to  represent  the  set  of  continuous  behaviors  that  a  high  level 
controller  can  select  from  at  any  given  time.  In  other  words,  the  model  of  the  preceding  chapter 
represents  a  discretization  or  aggregation  over  the  temporal  space,  while  the  model  of  the  current 
chapter  represents  a  discretization  or  aggregation  over  the  control  space.  From  this  perspective, 
one  can  view  the  controller  synthesis  algorithms  described  here  as  a  method  for  performing  control 
design  in  each  phase  of  the  sequential  reachability  specification. 
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Safety  and  reach-avoid  problems  for  switched  nonlinear  systems  as  considered  in  this  chap¬ 
ter  are  special  cases  of  hybrid  reachability  problems.  As  alluded  to  in  the  introduction  and  the 
preceding  chapter,  numerous  theoretical  and  computational  tools  have  been  developed  to  address 
such  problems  over  the  past  two  decades,  under  varying  assumptions  on  the  hybrid  system  dy¬ 
namics.  Early  efforts  focused  on  timed  automata  and  linear  hybrid  automata  (Alur  and  Dill,  1994; 
Maler  et  al.,  1995;  Henzinger  et  al.,  1997;  Larsen  et  al.,  1997;  Yovine,  1997;  Henzinger  et  al., 
1998;  Alur  et  al.,  2000),  in  which  case  the  simplicity  of  continuous  dynamics  allows  an  exact  dis¬ 
crete  abstraction  of  the  hybrid  system  and  the  adaptation  of  discrete  synthesis  techniques  to  hybrid 
reachability  problems.  Although  exact  solutions  to  these  problems  are  still  possible  for  certain 
classes  of  hybrid  systems  with  linear  and  nonlinear  dynamics  (Lafferriere  et  al.,  1999;  Shaker- 
nia  et  al.,  2001;  Del  Vecchio,  2009),  the  consideration  of  general  forms  of  continuous  dynamics 
often  requires  approximation  techniques.  This  has  led  to  considerable  research  into  methods  for 
computing  approximate  continuous  reachable  sets. 

One  class  of  methods  propagates  explicit  set  representations  such  as  polyhedra  (Asarin  et  al., 
2000c/;  Bemporad  et  al.,  2000 b\  Chutinan  and  Krogh,  2003;  Hwang  et  al.,  2005;  Han  and  Krogh, 
2006),  ellipsoids  (Kurzhanski  and  Varaiya,  2000;  Botchkarev  and  Tripakis,  2000),  or  zonotopes 
(Girard,  2005;  Girard  and  Le  Guemic,  2008)  directly  under  flows  of  the  system.  Methods  in  this 
class  typically  consider  linear  systems  or  feedback  linearizable  nonlinear  systems.  Another  class 
of  methods  approximates  sets  using  representations  defined  on  a  discretized  grid  of  the  continuous 
state  space,  including  approaches  based  upon  viability  theory  (Cardaliaguet  et  al.,  1999;  Aubin  et 
al.,  2002;  Saint-Pierre,  2002;  Gao  et  al.,  2007)  and  viscosity  solutions  of  Hamilton-Jacobi  equa¬ 
tions  (Mitchell  et  al.,  2005;  Mitchell,  2011).  These  approaches  tend  to  be  more  general  in  the 
types  of  reachability  computation  and  system  dynamics  that  can  be  handled,  but  are  often  more 
computationally  intensive. 

Parallel  to  these  efforts,  which  compute  sets  directly  in  the  continuous  state  space,  techniques 
have  been  proposed  for  computing  approximate  discrete  abstractions  of  hybrid  systems  which  al¬ 
lows  the  application  of  existing  methods  in  discrete  verification  and  supervisory  control  (Tiwari 
and  Khanna,  2002,  2004;  Kloetzer  and  Belta,  2006;  Tabuada,  2008;  Girard  et  al.,  2010).  Although 
computationally  efficient  for  classes  of  systems  with  polynomial  or  affine  dynamics,  current  in¬ 
stantiations  of  these  techniques  feature  a  similar  type  of  growth  in  complexity  as  the  viability  and 
Hamilton-Jacobi  approaches  when  general  nonlinear  systems  are  considered.  Finally,  for  purposes 
of  verifying  safety,  a  computational  technique  based  upon  Lyapunov  type  analysis  has  also  been 
proposed  in  Prajna  et  al.  (2007)  for  systems  with  polynomial  dynamics. 

The  reachable  set  computation  and  controller  synthesis  techniques  described  in  this  chapter  are 
based  upon  the  game  theoretic  framework  for  hybrid  controller  design  as  outlined  in  Lygeros  et 
al.  (1999 b)  and  Tomlin  et  al.  (2000),  which  formulates  hybrid  reachability  problems  as  zero-sum 
dynamic  games  between  the  control  and  rational  disturbances.  The  strength  of  this  framework 
lies  in  its  consideration  of  general  forms  of  nonlinear  continuous  dynamics,  as  well  as  dynamic 
uncertainty  modeled  by  bounded  disturbance  terms.  The  development  of  level  set  techniques  for 
computing  approximate  solutions  to  Hamilton-Jacobi-Isaacs  (HJI)  equations  (Mitchell  et  al.,  2005) 
also  promises  accurate  numerical  computation  of  reachable  sets,  although  restricted  to  systems  of 
up  to  five  continuous  state  dimensions  due  to  a  grid-based  approximation.  However,  with  the  diffi- 
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culties  inherited  from  analyzing  nonlinear  systems,  continuous  feedback  policies  in  general  cannot 
be  derived  in  closed  form.  Furthermore,  due  to  the  interdependence  of  discrete  and  continuous 
dynamics,  the  problem  of  designing  discrete  controls  also  does  not  yield  readily  to  automated  syn¬ 
thesis  algorithms.  Thus,  applications  of  this  framework  to  practical  problems  such  as  automated 
highway  platooning  (Lygeros  et  al.,  1998),  flight  envelope  protection  (Bayen  et  al.,  2007),  and 
aircraft  conflict  resolution  (Tomlin  et  al.,  2001)  are  often  performed  on  a  case  by  case  basis  with 
considerable  insight  from  the  control  designer. 

There  has  been  an  ongoing  effort  to  develop  computational  algorithms  for  the  synthesis  of  deci¬ 
sion  policies  using  Hamilton- Jacobi  reachable  sets.  In  Teo  and  Tomlin  (2003),  the  authors  describe 
a  method  for  selecting  evasive  maneuvers  for  closely-spaced  runway  approaches,  by  checking  the 
aircraft  state  information  against  unsafe  sets  computed  for  the  evasive  maneuvers.  Due  to  the 
needs  for  fast  online  computation  of  reachable  sets,  the  computation  method  is  tailored  to  the  par¬ 
ticular  application  and  assumes  an  open-loop  selection  of  inputs.  Another  method  is  discussed  in 
Hwang  et  al.  (2005)  for  extracting  control  inputs  from  polytopic  approximations  of  reachable  sets 
for  certain  classes  of  nonlinear  systems.  An  optimal  selection  of  input,  however,  can  be  only  deter¬ 
mined  along  the  boundaries  of  the  approximating  poly  tope,  and  may  lead  to  chattering  effects.  The 
work  described  in  Oishi  et  al.  (2006)  proposes  an  approach  for  selecting  feedback  linearizing  con¬ 
trol  laws  to  achieve  stabilization  under  safety  constraints,  based  upon  the  results  of  a  reachability 
calculation.  However,  issues  of  implementation  and  guarantees  of  safety  and  target  attainability  in 
applications  with  sampled  state  measurements  and  piecewise  constant  controls  were  not  addressed. 

The  main  contributions  of  our  proposed  methodology  are  as  follows.  First,  we  provide  sys¬ 
tematic  procedures  for  the  numerical  computation  of  feedback  control  policies  to  satisfy  hybrid 
reachability  specifications  for  switched  nonlinear  systems.  In  particular,  several  reachability  algo¬ 
rithms  are  proposed  such  that  the  output  of  each  algorithm  include  both  a  set  of  initial  conditions  on 
which  a  given  reachability  objective  is  feasible,  as  well  as  a  set-valued  control  law  represented  in 
terms  of  a  collection  of  reachable  sets.  Second,  we  carry  out  analysis  and  synthesis  tasks  within  the 
framework  of  a  sampled-data  system  model.  This  ensures  that  the  controllers  computed  through 
our  algorithms  will  preserve  the  desired  reachability  specifications  in  continuous  time  even  as  state 
measurements  and  applications  of  control  actions  may  be  constrained  to  take  place  at  sampling 
instants.  Finally,  we  give  detailed  algorithms  for  the  online  selection  of  control  inputs  using  the 
results  of  the  offline  reachability  computations.  These  algorithms  represent  possible  approaches 
to  practically  implement  reachability-based  controllers,  by  storing  numerical  representations  of 
reachable  sets  and  accessing  them  in  an  online  setting  as  lookup  tables. 

The  organization  of  this  chapter  is  as  follows.  In  section  3.2,  we  give  a  formal  description  of 
the  sampled-data  switched  system  model.  In  section  2.4,  we  formulate  the  safety  and  reach-avoid 
control  problems  within  the  context  of  this  switched  system  model.  In  section  3.4,  we  provide 
a  controller  synthesis  algorithm  for  the  safety  control  problem,  along  with  a  numerical  example 
of  aircraft  collision  avoidance.  In  section  3.5,  we  propose  a  solution  for  the  finite  horizon  reach- 
avoid  problem,  and  illustrate  the  methodology  through  an  experimental  application  on  a  quadrotor 
platform  in  section  3.6.  Finally,  we  revisit  the  AAR  example  in  section  3.7,  in  order  to  discuss  the 
application  of  the  proposed  computational  algorithms  to  switching  controller  design  in  sequential 
reachability  problems. 
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3.2  Sampled-Data  Switched  System  Model 

We  model  a  sampled-data  switched  system  as  a  special  case  of  the  hybrid  automaton  discussed  in 
section  2.3.1. 

Definition  3.1  (Sampled-Data  Switched  System).  A  sampled-data  switched  system  is  a  tuple 
dX?sw  —  (2,2f,E,  f/,D,  <5,/,  T),  defined  as  follows. 

•  Discrete  state  space  Q  {q\,q2,  ...,qm},  m  G  N. 

•  Continuous  state  space  X  :=  M'7. 

•  Discrete  input  space  E  :=  { <Ti ,  <r2 , . . . ,  ona}. 

•  Continuous  input  space  U,  a  compact  subset  of  Whl. 

•  Disturbance  input  space  D ,  a  compact  subset  of  Wld. 

•  Discrete  transition  function  5  :  Q  x  E  — ) >  Q,  describing  the  discrete  state  evolution. 

•  Vector  field  f :  QxX  xU  xDgM",  describing  the  continuous  state  evolution.  It  is  assumed 
that  /  is  uniformly  continuous  and  bounded,  and  that  for  fixed  q  G  Q,  u  G  U ,  and  d  G  D.  the 
function*  — »  f(q,x,u,d)  is  Lipschitz  continuous. 

•  Sampling  interx’al  T  >  0. 

As  noted  previously,  the  discrete  states  of  this  model  have  a  different  interpretation  from  the 
discrete  states  of  the  sequential  transition  system  models  given  in  sections  2.3.2  and  2.3.3.  Specifi¬ 
cally,  the  discrete  state  space  Q  of  M'sw  can  be  viewed  as  a  set  of  operation  modes  that  are  provided 
as  control  choices  to  a  high  level  controller,  while  the  discrete  state  space  of  a  sequential  transition 
system  can  be  viewed  as  a  set  of  temporal  phases  in  a  dynamic  process. 

Informally,  the  executions  of  a  sampled-data  switched  system  proceeds  as  follows.  At  each 
sampling  instant  kT ,  we  receive  measurements  of  the  system  state  (, q(kT),x(kT )),  and  select  based 
upon  this  information  a  discrete  input  o(kT)  G  E  and  a  continuous  input  u(kT)  G  U,  which  are 
held  constant  on  the  sampling  interval  [kT.  (k+  1)7').  In  response,  the  disturbance  is  allowed  to 
select  a  realization  d  :  [kT.  (k+l)T)  — »  D.  Given  the  switching  command  c(kT ),  the  discrete  state 
transitions  to  8(q(kT),o(kT))  G  Q.  The  continuous  state  then  evolves  according  to  the  vector  field 
in  the  updated  discrete  state: 

*0)  =  f(8(q(kT),a{kT)),x(t),u(kT),d(t)),  (3.1) 

for  t  G  [kT.  (k  +  1  )7'j.  Under  the  assumptions  placed  upon  the  vector  field  /,  the  existence  and 
uniqueness  of  solutions  to  (3.1)  is  assured  on  each  sampling  interval.  At  the  next  time  step,  the 
discrete  state  is  then  given  by  q((k+  1)T)  =  8(q(kT),a(kT)),  while  the  continuous  state  is  given 
by  x((k+  1)T)  as  obtained  from  the  solution  to  (3.1),  and  the  same  process  repeats. 

More  precisely,  we  allow  control  inputs  to  be  chosen  according  to  a  set- valued  feedback  law 
defined  as  follows. 
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Definition  3.2  (Control  Policy).  A  control  policy  for  J^sw  is  a  sequence  /l  =  ...)  of  maps 

/4  :  QxX  — )■  2Sxf/\0.  We  denote  the  set  of  such  admissible  control  policies  by  . 

In  particular,  the  feedback  map  Hk  provides  a  set  of  possible  control  inputs  given  a  sampled 
state  measurement  ( q(kT)1x(kT )). 

Under  the  worst-case  assumption  that  the  disturbance  may  be  a  rational  adversary,  we  model  a 
disturbance  strategy  for  M'sw  as  a  sequence  of  maps  from  the  state  and  control  input  space  to  the 
set  of  admissible  disturbance  realizations.  More  specifically,  consider  the  set  of  functions 

—  {d  :  [0,  T]  — *  D\d(-)  is  measurable}  . 

Definition  3.3  (Disturbance  Strategy).  A  disturbance  strategy  for  M'sw  is  a  sequence  y  =  (7o,  7t,  •••) 
of  maps  :  <2  x  X  x  E  x  {/  — »  We  denote  the  set  of  such  admissible  disturbance  strategies  by 

r. 


We  can  now  give  a  formal  definition  for  the  executions  of  a  sampled-data  switched  system 
under  fixed  choices  of  control  policy  and  disturbance  strategy. 

Definition  3.4  (Switched  System  Execution).  For  a  given  initial  condition  (go  Ao)  G  Q  x  X,  control 
policy  /i  G  ./#,  disturbance  strategy  y  G  F.  and  time  horizon  N  >  0,  the  execution  of  a  sampled-data 
switched  system  J^sw  on  [0,  AT]  is  a  function  ( q,x ) :  [0,  NT]  — >  Q  x  X  as  returned  by  the  following 
algorithm. 


Algorithm  3.2.1  Switched  System  Execution 

Require:  Initial  condition  (go  Ao)  G  Q  x  X,  control  policy  /l  G  ■///,  and  disturbance  strategy  y  G  F. 

Set  q( 0)  qo,  x(0)  4=  xq\ 

for  k  —  0  to  N  do 

Choose  (o(kT),u(kT))  G  /lic(q(kT)1x(kT)); 

Set  d  =  yi<(q(kT)1x(kT),  a(kT),u(kT )); 

Setg(t)  =  8(q(kT),o{kT))  fort  G  (kT,  (k+  1)7]; 

Set.v(t),  t  G  [kT.  (k+l)T]  as  the  solution  of 

x{t )  =  f(8(q(kT ),  o(kT)),x(t),u(kT),d(t  —  kT)); 


end  for 

return  (q(t),x(t)),  t  G  [0,  AT]. 


By  the  above  definition,  the  discrete  state  trajectory  q(-)  is  piecewise  constant  with  jumps  oc- 
curing  at  sampling  instants,  while  v(-)  is  a  continuous  function  of  time.  Furthermore,  it  should 
be  emphasized  that  although  the  control  values  are  to  be  held  constant  on  sampling  intervals  (as 
consistent  with  a  sampled-data  setting),  the  disturbance  is  allowed  to  choose  a  time-varying  real¬ 
ization  on  each  sampling  interval,  possibly  in  response  to  the  control  input  selection.  This  allows 
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us  to  treat  a  range  of  robust  control  problems  and  differential  game  problems  in  which  the  noise  or 
disturbance  entering  into  the  continuous  dynamics  may  be  adversarial  under  worst-case  assump¬ 
tions. 

3.2.1  Example  -  Pairwise  Aircraft  Conflict  Resolution 

In  the  following,  we  will  illustrate  this  modeling  framework  through  an  example  of  aircraft  conflict 
resolution,  as  adapted  from  Tomlin  et  al.  (2000);  Mitchell  et  al.  (2005);  Hwang  et  al.  (2005),  in 
which  it  is  used  as  a  benchmark  for  hybrid  and  nonlinear  reachability  analysis.  A  similar  model  as 
presented  here  has  been  employed  in  Teo  and  Tomlin  (2003)  for  an  experimentally  demonstrated 
conflict  detection  and  resolution  algorithm  for  closely-spaced  parallel  runway  approaches. 

The  conflict  scenario  involves  two  aircraft  moving  in  the  plane,  one  of  which  is  controlled 
(referred  to  as  aircraft  1),  while  the  other  is  uncontrolled  (referred  to  as  aircraft  2).  The  task  is  to 
synthesize  the  controls  for  aircraft  1  so  as  to  avoid  a  collision  with  aircraft  2,  subject  to  the  worst- 
case  controls  of  aircraft  2.  The  relative  motion  of  aircraft  2  with  respect  to  aircraft  1  is  modeled 
using  the  following  kinematic  equations. 


X\ 

— iq  +  d\  COSX3  +  u2x2 

X  — 

x2 

d\  sin.V3  —  u2x\ 

x 3 

d2  —  u2, 

=  f(x,ui,u2,di,d2) 


where  x\,  x2  is  the  relative  position  of  aircraft  2  in  the  aircraft  1  reference  frame;  x2  is  the  relative 
heading  of  aircraft  2  in  the  aircraft  1  reference  frame;  jq,  d\  are  the  linear  velocities  of  aircraft  1 
and  2,  respectivley;  u2.d2  are  the  angular  velocities  of  aircraft  1  and  2,  respectively.  Now  consider 
a  simplified  model  for  the  control  system  of  aircraft  1  as  represented  by  the  state  transition  diagram 
shown  in  Figure  3.1. 


Figure  3.1:  Two  mode  control  system  for  aircraft  conflict  resolution. 


For  this  particular  example,  the  discrete  state  space  is  Q  —  {q\ .  q2},  where  q\  is  a  straight  and 
level  flight  maneuver  in  which  aircraft  1  modifies  its  linear  velocity,  while  q2  is  a  turning  maneuver 
in  which  aircraft  1  modifies  its  angular  velocity.  The  continuous  state  space  within  each  discrete 
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state  is  X  =  M3.  The  set  of  discrete  inputs  is  given  by  E  =  {<Ti,  <72},  where  <7,  corresponds  to  a 
switch  to  mode  <7,.  This  gives  rise  to  the  discrete  transition  function 


8{q,a) 


<7l, 

b 

b 

<72, 

O  =  <72 

In  mode  <71,  the  linear  velocity  input  u\  takes  on  a  range  U\  =  [y,  v]  C  M,  while  the  angular  velocity 
is  held  constant  at  some  value  U2  =  coo-  In  mode  <72,  the  angular  velocity  input  U2  takes  on  a 
range  U2  =  [ft),  ft)]  C  M,  while  the  linear  velocity  is  held  constant  at  some  value  ii\  =  vq.  This 
results  in  the  continuous  input  space  U  —  U\  x  U2  C  M2.  Similarly,  the  inputs  d\ ,  (I2  of  aircraft  2 
are  assumed  to  be  chosen  from  compact  sets  T>i,T>2  C  M,  resulting  in  the  disturbance  input  space 
D  =  D\  x  D2  C  M2.  The  corresponding  vector  fields  in  modes  q\  and  c/2  are  given  by 


f(q,x,u,d) 


u  1 ,  ft)o,  d\ ,  d.2) ,  q  q\ 
f(x,v0,U2,dl,d2),  q  =  q2- 


3.3  Problem  Formulations 

Within  the  framework  of  sampled-data  switched  systems,  we  will  consider  two  controller  synthe¬ 
sis  problems  which  commonly  arise  in  safety-critical  control  applications.  In  the  safety  control 
problem,  the  objective  is  to  synthesize  a  control  policy  /i,  such  that  the  closed-loop  system  tra¬ 
jectory  (g(-), .*(•))  remains  within  a  prescribed  safe  set  at  all  times.  This  could  be,  for  example,  a 
flight  envelope  protection  problem  for  an  aircraft.  On  the  other  hand,  in  the  reach-avoid  control 
problem,  the  objective  is  to  synthesize  a  control  policy,  such  that  the  closed-loop  trajectory  enters 
a  target  set  within  finite  time  while  remaining  outside  an  unsafe  set.  This  could  be,  for  example, 
an  autonomous  navigation  problem,  in  which  the  target  set  is  a  goal  region,  while  the  unsafe  set  is 
comprised  of  the  obstacles  in  the  environment. 

To  be  more  precise,  suppose  we  are  given  a  sampled-data  switched  system  and  a  collection 
of  sets  Wi  C  R'1  which  specifies  the  safe  set  within  each  mode  g,  e  Q.  Then  the  set  of  safe  states 
for  M'sw  is  given  by  XV H  =  lj''i ,  {<7,}  x  Wj.  A  formal  statement  of  the  safety  control  problem  is  as 
follows. 

Problem  3.1.  Given  a  sampled-data  switched  system  time  horizon  N  >  0,  and  safe  set  XV H : 

1.  Compute  a  set  of  states  G^a  fe  C  Q  x  X  such  that  there  exists  an  admissible  control  policy 
/i  G  so  that  for  any  initial  condition  (</oAo)  £  G^aje  and  disturbance  strategy  y  e  T,  the 
closed-loop  state  trajectory  (</(•),  jc(-)),  as  defined  by  Algorithm  3.2.1,  satisfies  (q(t):x(t))  G 
WH  for  all  t  G  [Q,NT]; 

2.  Synthesize  a  control  policy  /i  G  ■///  such  that  the  above  conditions  are  satisfied  for  each 
initial  condition  in  G^ajp . 
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For  the  rest  of  this  paper,  we  will  refer  to  GRaje  as  a  horizon-N  safe  set ,  and  a  control  policy 
which  satisfies  the  safety  specification  on  G^aje  as  a  horizon-N  safe  control  policy  with  respect  to 
Ggaje.  By  letting  N  — >  <*>,  we  can  consider  an  infinite  horizon  version  of  this  problem  in  which  the 
objective  is  to  keep  the  system  state  within  a  set  WH  for  every  t  >  0.  This  may  be  of  interest,  for 
example,  in  a  robust  stabilization  application  with  specifications  of  maintaining  the  closed-loop 
trajectory  within  a  region  around  the  origin. 

Now  suppose  instead  that  we  are  given  a  target  set  RH  —  Uj'ij  {<7,}  x  Rj  that  the  system  state 
is  required  to  reach  within  some  finite  time  horizon  [0,  AT],  and  an  avoid  set  AH  =  IJ/li  {hi}  x  A; 
that  the  system  state  is  required  to  stay  away  from  at  all  times.  Then  the  reach-avoid  problem  can 
be  formulated  as  follows. 

Problem  3.2.  Given  a  sampled-data  switched  system  MJSW,  time  horizon  N,  target  set  RH  and  avoid 
set  Ah\ 

1 .  Compute  a  set  of  states  GRA  C  Q  x  X  such  that  there  exists  an  admissible  control  policy  p  £ 
.M  so  that  for  any  initial  condition  (70, *0)  £  GhRA  and  disturbance  strategy  y  £  T,  the  closed- 
loop  state  trajectory  (#(•), *(•)),  as  defined  by  Algorithm  3.2.1,  satisfies  (q(kT),x(kT))  eRH 
for  some  k  £  {0, 1, . . .  ,  A},  and  (q(t),x(t))  f  AH  for  all  t  £  [0 ,kT]\ 

2.  Synthesize  a  control  policy  p  £  ./£  such  that  the  above  conditions  are  satisfied  for  each 
initial  condition  in  GRA . 

As  before,  we  will  refer  to  GRA  as  a  horizon-N  reach-avoid  set,  and  a  control  policy  which  sat¬ 
isfies  the  reach-avoid  specification  on  GRA  as  a  horizon-N  reach-avoid  control  policy  with  respect 

to  G&. 

Our  approach  to  the  safety  and  reach-avoid  problems  consists  of  first  computing  an  approxi¬ 
mate  representation  of  the  set  GRaje  or  GRA  through  an  iterative  reachability  algorithm,  and  then 
deriving  a  feedback  control  policy  p  in  terms  of  the  collections  of  reachable  sets  returned  by  the 
reachability  computation.  In  order  to  facilitate  the  computation  of  continuous  reachable  sets,  as 
well  as  to  ensure  a  finite  representation  of  the  feedback  policy,  we  will  impose  the  following  set  of 
assumptions. 

Assumption  3.1. 

1.  The  continuous  input  space  U  is  discretized  into  a  finite  set  U  C  U,  called  a  quantized  input 
set. 

2.  For  each  mode  q{,  the  unsafe  set  Wf  C  X  is  closed  and  can  be  represented  by  a  bounded  and 
Lipschitz  continuous  function  (j)wc  :  X  — >  M  such  that 

Wf={xeX,$wc(x)<o}. 
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3.  For  each  mode  qt,  the  target  set  R,  and  the  avoid  set  A,  are  closed  and  can  be  represented  by 
bounded  and  Lipschitz  continuous  functions  (j)Ri  :  X  — >  M  and  (j)Ai  :  X  — >  M  such  that 

Rj  =  {x  e  X,(j)Ri(x)  <  0} , 

A,-  =  {x  eX,<j>Ai(x)  <  0} . 

In  Assumption  3.1,  a  quantized  input  set  is  specified  in  order  to  obtain  a  finite  representation 
of  the  control  policy,  in  the  form  of  a  finite  collection  of  reachable  sets.  In  certain  application 
scenarios,  this  may  also  have  a  practical  significance  in  that,  due  to  digital  implementation  or 
high  level  abstractions,  the  range  of  control  choices  may  be  quantized.  Furthermore,  when  the 
continuous  dynamics  (3.1)  is  affine  in  the  control  input,  the  optimal  controls  for  time-optimal 
steering  problems,  which  have  close  relationships  to  reachability  problems,  can  be  sometimes 
chosen  from  within  a  finite  set  on  the  boundary  of  the  input  set  (known  as  the  bang-bang  principle). 
In  particular  examples  of  nonlinear  systems,  it  may  also  be  possible  to  prove  such  properties  using 
an  optimal  control  argument  Bayen  et  al.  (2007).  For  problems  of  this  type,  the  quantization  levels 
can  be  chosen  according  to  the  set  of  optimal  inputs. 

On  the  other  hand,  the  assumptions  on  the  unsafe  set,  target  set,  and  avoid  set  are  necessary  for 
the  numerical  computation  of  reachable  sets  through  level  set  methods  (Mitchell  et  al.,  2005).  The 
functions  (f>wc.  (j)Ri,  and  (pAi  are  commonly  referred  to  as  the  level  set  representation  of  the  sets  XV f, 
Rj,  and  A/,  respectively.  As  an  example,  consider  the  pairwise  aircraft  conflict  resolution  scenario 
described  in  the  previous  section,  and  suppose  that  the  collision  zone  is  specified  as  a  disc  of  radius 
ro  centered  on  aircraft  1,  then  a  level  set  representaion  of  the  unsafe  set  XV f  is  simply  given  by 

=  \Jx\+xl-rQ,  i  =  1,2. 


3.4  Safety  Controller  Synthesis 

In  this  section,  we  discuss  a  solution  approach  to  the  safety  control  problem  under  Assumption 
3.1.  First,  an  algorithm  is  constructed  for  computing,  in  an  offline  setting,  a  set  of  feasible  initial 
conditions  for  the  safety  problem  using  a  restricted  class  of  control  policies  whose  range  lies  within 
the  quantized  input  set.  Second,  it  is  shown  that  the  result  of  this  reachability  computation  gives 
a  representation  for  a  control  policy  satisfying  the  safety  objective,  and  an  algorithm  is  given  for 
implementing  this  policy  in  an  online  setting.  Towards  the  end  of  the  section,  we  also  discuss 
extensions  of  this  methodology  to  the  infinite  horizon  case. 

For  notational  conveniences,  we  denote  by  c  the  subset  of  control  policies  which  selects 
continuous  control  inputs  from  the  quantized  input  set  U . 

3.4.1  Safe  Set  Computation 

Let  =  (Q,X ,  Z,  U,D,  8,f1  T)  be  a  sampled-data  switched  system  defined  as  in  section  3.2.  For 
a  fixed  qj  e  Q  and  u  e  U  C  U,  consider  a  continuous  state  reachability  operator  x//j,u ,  which  takes 


50 


as  its  argument  a  set  G  C  X  and  produces  as  its  output  the  set  of  states  that  can  be  forced  inside  G 
within  [0,  T]  by  some  realization  of  the  disturbance. 

^'■’“(G)  ={x0ei:  3 d(-)  e  %,3t  g  [o,r],  x(t)  e  G},  (3.2) 

where  jc(-)  is  the  solution  of  the  ODE 

x(t)  =f{qu x{t),u,d{t)),  x(0)  =  x0 


on  the  interval  [0.  T], 

The  computation  of  this  set  can  be  viewed  within  either  an  optimal  control  or  a  differential 
game  framework.  Under  an  optimal  control  interpretation,  the  disturbance  is  assumed  to  select 
a  worst-case  choice  of  realization  d(-)  G  @t,  in  response  to  a  fixed  choice  of  control  u(t)  =  u, 
V?  G  [0,  T],  so  as  to  to  drive  the  system  state  into  G.  On  the  other  hand,  this  can  be  also  viewed  as  a 
special  case  of  a  differential  game  on  [0,  T]  in  which  the  control  choice  is  restricted  to  the  singleton 
u.  Using  either  interpretation,  one  can  characterize  the  evolution  of  the  unsafe  set  through  an 
appropriate  Hamilton- Jacobi  equation  (Evans  and  Souganidis,  1984;  Bardi  and  Capuzzo-Dolcetta, 
1997;  Mitchell  et  al.,  2005). 

In  particular,  suppose  that  the  set  G  has  a  level  set  representation  (/)q  :  X  — »  M.  Let  (j)  :  X  x 
[—77  0]  — »  M  be  the  unique  viscosity  solution  (Crandall  and  Lions,  1983)  to  the  following  HJB 
equation 


d(j) 

— — f-min 
dt 


=  0,  </>(x,  0)  =  0g(x), 


(3.3) 


where  the  optimal  Hamiltonian  is  given  by 


H(x,p )  —  minpT  f(qi,x,u,d). 

deD 


(3.4) 


Then  by  a  special  case  of  the  arguments  presented  in  Evans  and  Souganidis  (1984)  and  Mitchell  et 
al.  (2005),  we  have 

srfj V'U(G)  =  {xe  X,(j)(x.,  - T )  <  0}  . 

Several  remarks  are  in  order.  First,  the  minimization  with  respect  to  d  in  equation  (3.4)  gives 
the  disturbance  a  slight  advantage,  as  the  disturbance  has  knowledge  of  the  control  input  u  on 
[0, T].  Second,  the  Hamiltonian  in  equation  (3.4)  can  be  calculated  analytically  for  systems  in 
which  the  disturbance  enters  affinely  in  the  model,  namely  when  the  vector  field  in  each  mode  q, 
can  be  written  in  the  form  f(qi,x ,  u,d)  —  fj(x ,  u)  +gi(x)d,  and  the  disturbance  input  space  takes  the 
form  D  =  n ;= i  \<Li,di\.  Note,  however,  that  J)  is  not  required  to  be  affine  in  u.  Third,  the  min[0,//] 
formulation  in  equation  (3.3)  constrains  the  reachable  set  to  grow  over  time,  which  results  in  the 
property  that  G  C  g/j!',u(G). 

On  the  computational  side,  a  numerical  toolbox  (Mitchell,  2001a )  is  available  to  compute  a 
convergent  approximation  of  the  viscosity  solution  to  (3.3)  on  a  discrete  grid  of  the  continuous 
state  space  X  =  W\  based  upon  an  implementation  of  level  set  methods  (Sethian,  1999;  Osher 
and  Fedkiw,  2002).  However,  due  to  the  fact  that  this  grid  is  chosen  to  be  uniform  for  numerical 
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convergence,  the  computational  cost  scales  exponentially  with  the  continuous  state  dimension, 
which  currently  limits  the  application  of  this  method  to  problems  with  n  <  5. 

Now  consider  a  discrete  reachability  operator  Reach ,  taking  as  its  argument  a  discrete  state 
q  G  Q  and  producing  as  its  output  the  subset  of  Q  reachable  from  q  in  one  step: 

Reach(q)  {qf  G  Q  :  3cr  G  E,  8(q,o)  —  q}  . 

It  can  be  inferred  in  a  straightforward  manner  that  this  operator  can  be  computed  as  Reach(q)  = 
Uo-gE  5(<7,  a).  By  the  definition  of  E,  this  union  is  finite. 

Given  a  set  CJH  =  (J£Lj  {#;}  x  G;  C  QxX,  we  define  the  one-step  unsafe  set  with  respect  to 
Gh  as  follows. 


*tj!(GH)={(q,x)  eQxX  :  V(a,fi)  GExf/,3JGfr,  (3.5) 

3*g  [0,r],  (?W,x(0)eGff}, 

In  other  words,  this  is  the  set  of  initial  conditions  which  can  reach  GH  within  one  time  step  under 
an  admissible  disturbance  realization,  regardless  of  the  choice  of  control  inputs.  The  following 
result  provides  a  representation  for  gfj  in  terms  of  the  continuous  reachability  operator  s4j'u  and 
the  discrete  reachability  operator  Reach. 

Lemma  3.1.  Let  GH  =  U/=i  {<?;'}  x  G,  C  Q  x  X.  Then 

t(Gh )  -  |J  {qi}  xG,-U  n  H  ^T,M(Gj)  ]  .  (3.6) 

<li£Q  \q jeReach(qi)  CieU  ) 

Proof  For  notational  conveniences,  we  define  VH  =  (Jf=i  {hi}  x  G,  U  V-/,  where 

x=  n  n [<■•“(«;)• 

qjeReach(qj)  UeU 


Let  (^-,x)  G  (sz/j  (GH))C .  By  the  definition  in  (3.5),  there  exists  a  choice  of  controls  (<j,m)  G 
E  x  f7  such  that  for  every  choice  of  disturbance  realization  d  G  SV,  the  one  step  trajectory  of  M'sw 
initialized  at  (qi,x)  satisfies  (q(t),x(t))  ^  GH ,  Vf  G  [0,  T],  Let  qj  —  8(qj,o),  then  this  implies 
that  x  f  Gj  and  x  f  sz/jJ,u{Gf),  and  hence  x  f  G/U  V).  Thus,  we  have  (///j  (GH))C  C  {VH)C,  or 
equivalently,  VH  C  .sTj  (Gh). 

In  order  to  prove  the  reverse  inclusion,  consider  a  state  (qt,x)  G  (VH)C.  Then  by  the  definition 
of  VH,  x  £  Gj  and  there  exists  qj  G  Reach(qj)  and  u  G  U  such  that  x  ^  .sTj'  (Gj).  Let  o  G  E  be  a 
discrete  command  such  that  qj  —  8(qj,  a).  Then  under  the  choice  of  controls  (<7. «).  the  trajectory 
of  Jdfsw  starting  from  (qi,x)  satisfies  (q(t):x(t))  f  GH ,  Vt  G  [0,  T],  regardless  of  any  admissible 
disturbance  realization.  Thus,  (qi,x)  f  Aj(Gh),  from  which  it  follows  that  (VH  )CC«(G"))C, 
or  equivalently,  x/7w  ( GH )  C  VH .  □ 
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We  note  briefly  that  under  level  set  representations,  computing  the  union  or  intersection  of  sets 
reduces  to  computing  the  pointwise  minimum  or  maximum  of  level  set  functions.  Specifically, 
suppose  (/>a  and  Ob  are  level  set  representations  of  sets  A  and  B ,  respectively,  then  the  set  A  U  B  is 
represented  by  min  { <pA  .<?«}. 

Now  consider  Algorithm  3.4.1  for  computing  a  horizon-A  unsafe  set  under  the  policy  class  .///. 


Algorithm  3.4.1  Computation  of  horizon-A  Unsafe  Set 
Require:  WH  C  QxX  and  A  >  1 
1:  V"  <=  ( WH)C 
2:  for  j  —  0  to  A  —  1  do 
3:  l",  4=  af  (Vf) 

4:  end  for 
5:  return  Vf1,  V 2 


Proposition  3.1.  Given  a  sampled-data  switched  system  and  a  safe  set  WH  C  QxX,  let  Vfj 
be  the  output  of  Algorithm  3.4.1.  Then  is  a  horizon-N  safe  set. 

Proof.  Given  p  G  .J{  and  7  G  T,  we  denote  by  fik-tN  the  sequence  (pk,pk+i,  ...,AiV-i)>  and  by 
Yk^N  the  sequence  (jk- Yk+ 1,  •••,  Yn  1 ) •  The  corresponding  truncated  control  policy  space  and  dis¬ 
turbance  strategy  space  are  denoted  by  ./ffk^N  and  F^/V,  respectively.  We  will  prove  the  following 
statement  by  backward  induction  on  k:  there  exists  a  control  policy  pk-tN  £  such  that  for 

every  initial  condition  (qk-Ak)  £  (Vff_  kf  and  disturbance  strategy  Yk-tN  G  ^k-tN,  the  closed-loop 
trajectory  of  M'sw  satisfies  (q(t),x(t))  G  WH,  V/  G  [kT.NT],  Clearly,  the  statement  of  the  proposi¬ 
tion  follows  from  the  case  of  k  =  0. 

First,  for  the  case  of  k  =  A  —  1,  we  have  =  g/j  ((WH)C).  By  the  definition  of  in  (3.5), 
for  every  (q,x)  G  ( Vf1 )c,  there  exists  a  choice  of  control  input  {o,u){qX)  G  E  x  U  such  that  for 
every  disturbance  realization  d  G  $>t,  the  one  step  state  trajectory  satisfies  (q(f),x(l))  G  WH ,  V/  G 
[(A—  l)r,Ar].  Let  pf_l(q1x)  =  (cr, w)(9,*)>  V(^, jc)  G  (U1H)C,  then  p’f_l  is  a  safe  control  policy 
with  respect  to  (V[^)c.  Second,  for  the  inductive  step,  we  assume  that  for  some  j  G  (1,2,..., A—  1}, 
there  exists  a  safe  control  policy  jU/'-wv  G  with  respect  to  (Vfl  ff.  On  the  set  (V$_j+1)c  = 

{sTj  j))c ,  choose  a  one  step  control  policy  pj_ ,  such  that  the  trajectory  of  M'sw  over  [(  j  — 
l)TJT]  avoids  the  set  Vj^_j  (the  existence  of  such  a  policy  is  again  implied  by  (3.5)).  Then 

pj- i^n  —  ( pj _ |  .pj-tx)  is  a  safe  control  policy  with  respect  to  (yjj  /+i  )C-  The  result  then  follows 

by  induction.  □ 

3.4.2  Safe  Control  Policies 

In  the  proof  of  Proposition  3.1,  we  showed  the  existence  a  safe  control  policy  with  respect  to  {Vff)c 
within  the  restricted  policy  class  .M .  The  question  then  becomes  whether  an  explicit  representation 
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for  such  a  policy  can  be  derived.  It  turns  out  that  the  reachable  sets  generated  by  Algorithm  3.4.1 
provide  us  with  such  a  representation. 

Motivated  by  the  expression  for  the  reachability  operator  g/j  in  (3.6),  we  construct  the  follow¬ 
ing  set- valued  feedback  maps  for  the  choice  of  safe  control  inputs: 

FkSafe(q,x )  =  {(a,ff)  (3-7) 

for  (q,x)  G  (yH_k)c  and  k  =  0, 1, ..., N  —  1.  In  the  above,  we  denote  by  V^iqf  the  component  of 
Vj  in  mode  q{.  The  following  result  provides  us  with  a  formal  proof  that  these  set-valued  maps 
indeed  constitute  a  finite  horizon  safe  control  policy  on  (V^)c . 

Proposition  3.2.  Let  Vj,  j  —  1 , . . . ,  N  be  the  j-step  unsafe  sets,  as  computed  using  Algorithm  3.4.1. 
If(V”)c  f  0,  then  any  control  policy  p  G  ■/£  which  satisfies 

A k{q,x)  =  F*afe(q,x),  V(q,x)  G  (V$_*)C ,  (3.8) 

for  k  =  0, 1, ..., N  —  1  is  a  horizon-N  safe  control  policy  with  respect  to  (Vff)c. 

Proof.  By  the  representation  of  the  reachability  operator  gVj  in  (3.6),  it  can  be  inferred  that  the 
sets  Vj 1  satisfy 

v?cv?c...cvg. 

Thus,  (V”)c  0  implies  (V7//)c  f%,\/j  =  0, 1 ,  ...,7V.  Furthermore,  given  that  Vq  =  {WH)C ,  we 

also  have  (V? )c  C  WH,Wj  =  0, 1,  ...,N. 

Let  A  £  -rff  be  any  control  policy  which  satisfies  (3.8).  We  prove  the  following  statement  by 
forward  induction  on  k :  for  any  initial  condition  (g(0),jc(0))  G  (Vj^'f  and  disturbance  strategy 
7 G  r,  the  trajectory  of  satisfies  (#(?), jc(?))  G  WH ,  Vf  G  [0 ,kT]  and  (q(kT),x(kT))  G  {V^_k)c. 
The  proposition  follows  from  the  case  of  k  —  N. 

For  k  =  0,  it  is  clear  that  (^(0),x(0))  G  (Vtf)c  C  WH .  For  the  inductive  step,  we  assume  that 
for  some  j  G  (0, 1  ,...,N  —  1},  the  system  trajectory  satisfies  (^(t),x(t))  G  WH,  Vf  G  [0,jT]  and 
(q(jT),x(jT))  G  (Vtf  jf',  regardless  of  the  disturbance  strategy  y  G  F.  With  the  assumption  on 
A,  we  have  jlj(q(jT),x(jT))  =  Fj(q(jT):x(jT)).  From  (3.2)  and  (3.7),  it  can  be  then  inferred 
that  for  any  control  input  (o,u)  G  p.j{q{jT),x(jT)),  the  one-step  trajectory  satisfies  (q{l),x{l))  G 
( V^_j_l)c  C  WH ,  Vf  G  [JT,  (j  +  1)7’],  regardless  of  the  disturbance  realization.  The  result  then 
follows  by  induction.  □ 

Using  this  result,  we  can  compute  using  Algorithm  3.4.1  the  collections  of  level  set  functions 
representing  the  sets  Vj1  in  an  offline  setting,  and  then  use  these  functions  as  lookup  tables  to 
extract  safe  control  inputs  as  state  measurements  are  received.  A  possible  implementation  of  this 
procedure  is  given  in  Algorithm  3.4.2. 

It  should  be  remarked  that  given  level  set  representations  (j)k  of  the  sets  -tVy  ,U{V^_k_ ,  (</)), 
checking  the  condition  x(kT)  f  sVj  'U(V^  k_]  (qr))  is  equivalent  to  checking  the  condition 

Pt-‘(x(kT))  >  0. 
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Algorithm  3.4.2  Online  Implementation  of  Finite  Horizon  Safe  Control  Policy 

Require:  Vf,  j  =  1  (q(0),x(0))  G  (V^)c 

1:  for  k  —  0  to  N  —  1  do 
2:  FSkafe  4=  0; 

3:  Measure  state  (, q(kT),x(kT )); 

4:  for  all  (a,  u)  G  E  x  C7  do 

5:  4  <=  8(q{kT),c r); 

6:  if*(*r)  £  then 

7:  Add  (o'. u)  to  Fkafe; 

8:  end  if 

9:  end  for 

10:  Apply  input  (o*,  %)  G 

11:  end  for 


3.4.3  Infinite  Horizon  Safety  Problem 

Now  consider  an  extension  of  the  finite  horizon  safety  control  problem  as  discussed  in  the  preced¬ 
ing  sections  to  the  case  in  which  the  control  objective  is  to  keep  the  system  trajectory  within  a  safe 
set  WH  for  all  times.  Specifically,  we  are  interested  in  computing  a  set  GJaj-e  C  Q  x  X  such  that 
there  exists  an  admissible  control  policy  jU  G^#so  that  for  any  initial  condition  (<70,  *o)  £  Gc^aje 
and  disturbance  strategy  y  G  T,  the  closed-loop  state  trajectory  (g( •),.*(•))  satisfies  (q(t),x(t))  G 
WH  for  all  t  >  0.  Furthermore,  we  would  like  to  derive  an  infinite  horizon  safe  control  policy  from 
the  result  of  such  a  computation. 

As  observed  in  the  proof  of  Proposition  3.2,  the  sequence  of  unsafe  sets  Vj1,  j  =  0,1,---,  as 
computed  by  Algorithm  3.4.1  satisfies  the  following  monotonicity  condition: 

Vo  c^cvfc-. 

It  is  then  intuitive  that  if  Vj1  gradually  stops  growing  with  successive  iterations  of  the  algorithm 
and  converges  to  a  maximal  unsafe  set  ,  the  set  of  all  states  which  lie  outside  is  an  infinite 
horizon  safe  set. 

More  precisely,  suppose  that  Algorithm  3.4.1  converges  to  a  fixed  point  of  the  operator  .Gj 
within  a  finite  number  of  iterations,  namely 

Vflb+i  =  -<«)  =  Vnq,  some  N0  <  <*>.  (3.9) 

Then  by  induction,  it  can  be  inferred  that  V^j  =  VU,  VN  >  No.  Applying  Proposition  3.1,  it  then 
follows  that  (V^{])c  is  a  horizon-A  safe  set  for  every  N  >  No,  and  hence  an  infinite  horizon  safe  set. 
In  the  case  that  this  set  is  nonempty,  we  can  also  derive  an  infinite  horizon  safe  control  policy  from 
the  representation  of  V^j .  Specifically,  consider  the  set  of  safe  control  inputs  defined  by 

(<?,*)  =  eZxU  :x^^{q^a(V^(8(q,o)))]  .  (3.10) 
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Then  by  a  similar  argument  as  in  the  proof  of  Proposition  3.2,  any  stationary  policy  fi  =  (/i,  /i, ...)  e 
.M  which  satisfies 

£(«>*)  =FSafe(q,x ),  V(g,*)  G  (V^)C 

is  an  infinite  horizon  safe  control  policy  with  respect  to  (yff)c.  An  algorithm  for  implementing 
such  a  control  policy  can  be  constructed  similarly  as  Algorithm  3.4.2. 

In  the  literature,  conditions  under  which  (3.9)  holds  have  been  studied  in  terms  of  the  concept 
of  decidability ,  which  in  this  case  concerns  whether  an  infinite  horizon  reachability  question  can 
be  answered  by  a  finite  computation.  To  the  best  of  our  knowledge,  currently  known  decidability 
results  for  hybrid  systems  reachability  are  restricted  to  the  class  of  timed  automata,  linear  hybrid 
automata,  and  linear  continuous  dynamics  with  special  structures  (Henzinger  et  al.,  1998;  Alur 
et  al.,  2000).  Nonetheless,  for  certain  classes  of  problems  in  nonlinear  differential  games,  it  has 
been  shown  that  a  maximal  unsafe  set  exists  (Isaacs,  1967;  Merz,  1972),  and  that  a  numerical 
reachability  computation  indeed  converges  to  such  a  set  (Mitchell  et  al.,  2005).  In  such  cases,  one 
may  check  (3.9)  in  terms  of  the  convergence  of  the  level  set  functions  representing  Vj1. 

Revisiting  the  aircraft  conflict  resolution  example  from  Section  3.2,  consider  a  safety  control 
problem  where  we  would  like  to  keep  the  relative  aircraft  states  away  from  a  collision  zone  as 
defined  by 

Wf  =  |  x  G  M3  :  \J x \  +  x\  <  tq  1 ,  i  =  1,2 

for  some  positive  radius  ro  >  0.  For  the  reachability  computation,  we  select  the  input  bounds 
for  aircraft  1  and  aircraft  2  as  follows:  in  mode  1,  the  velocity  range  of  aircraft  1  is  chosen 
to  be  U i  =  [400  kts,500  kts],  with  a  constant  heading  input  coq  —  0;  in  mode  2,  the  velocity 
of  aircraft  1  is  fixed  at  vo  =  450  kts,  while  the  angular  velocity  is  allowed  to  vary  within  the 
range  fA  =  [—2  deg/s,  2  deg/s];  in  both  modes,  the  aircraft  2  input  ranges  are  chosen  to  be  D\  = 
[400  kts,  500  kts]  and  D2  =  [—  1  deg/s,  1  deg/s] .  The  collision  zone  radius  is  set  as  ro  =  5  nmi,  while 
the  sampling  interval  is  set  as  T  =  10  sec. 

With  a  uniform  discretization  of  U\  and  U2  into  11  input  levels,  we  perform  an  unsafe  set 
computation  using  Algorithm  3.4.1.  In  this  case,  it  was  found  that  this  computation  converges  to 
within  numerical  accuracy  of  a  fixed  point  after  about  7  time  steps.  The  resulting  infinite  horizon 
unsafe  set  ( G^aje ) c  is  shown  in  Figure  3.2  along  with  the  collision  zone  (WH)C .  To  illustrate 
the  set-valued  control  policy  obtained  from  this  computation,  we  take  a  slice  of  the  unsafe  sets 
■^t1,500 ((Ggafe{qi))c)  and  .c/^2 '2(( G^a/e ( 72 ) ) C )  at  a  relative  heading  angle  of  K  radians  (a  sce¬ 
nario  in  which  the  two  aircraft  are  directly  facing  each  other).  According  to  (3.10),  in  the  comple¬ 
ment  of  the  set  srfj 1 ,50°  ( ( Gr^afe ( <Z  1 )  )C ) ,  one  can  choose  the  straight  maneuver  (0\.  500  kts)  as  the 

safe  input,  while  in  the  complement  of  srfj2'  ( ( G^aje  (7/2 )  )c ) ,  one  can  choose  the  turn  maneuver 
(<72,2  deg/s). 

From  the  result  of  this  reachability  computation,  the  infinite  horizon  safety  controller  is  syn¬ 
thesized  and  implemented  in  simulation  using  Algorithm  3.4.2,  with  aircraft  2  applying  random 
inputs  chosen  from  within  its  input  ranges  D\  and  D2.  A  sample  run  of  this  simulation  is  given  in 
Figure  3.3,  in  which  aircraft  1  successfully  avoids  a  collision  with  aircraft  2  over  a  4  minute  time 
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(b) 


Figure  3.2:  Results  of  infinite  horizon  reachability  calculations  for  two  aircraft  conflict  resolution 
example:  (a)  Infinite  horizon  unsafe  set;  (b)  Slice  of  unsafe  set  at  relative  angle  of  n  radians. 


horizon.  In  this  case,  we  ran  a  MATLAB  implementation  of  Algorithm  3.4.2  on  a  2  GHz  Intel 
Xeon  processor  with  4  GB  of  memory,  and  the  average  computation  time  for  each  iteration  of  the 
algorithm  was  found  to  be  approximately  0.1  seconds. 

3.5  Reach-avoid  Controller  Synthesis 

Using  a  similar  approach  as  in  the  safety  problem,  we  now  discuss  a  solution  to  the  finite  horizon 
reach-avoid  problem  under  Assumption  3.1.  In  particular,  a  reachability  algorithm  is  given  for 
computing  the  set  of  states  reachable  to  a  target  set  RH  while  avoiding  an  unsafe  set  AH ,  under  the 
quantized  policy  space  ./#,  along  with  a  procedure  for  synthesizing  a  reach-avoid  control  policy 
from  the  result  of  this  computation. 

3.5.1  Reach-avoid  Set  Computation 

For  the  objective  of  reaching  a  target  set,  we  introduce  a  continuous  state  reachability  operator 
&qi'u,  taking  as  its  argument  a  set  G  C  X  and  producing  as  its  output  the  set  of  states  that  can  be 
controlled  inside  G  at  time  T ,  regardless  of  the  disturbance  realization: 

\G )  =  {jco  E  X  :  W(-)  e  &t,  x{T)  E  G} , 

where  *(•)  is  the  solution  of  the  ODEi(t)  =  f{qi,x{t),u,d{t)),  jc(0)  =  xq  on  the  interval  [0  ,T]. 

The  computation  of  ^?j!,u(G)  can  be  also  viewed  from  a  differential  game  perspective,  in  which 
the  control  chooses  an  input  u(t)  —  u,  V/  e  [0.  T]  so  as  to  achieve  x (T)  G  G,  while  the  disturbance 
selects,  in  response,  a  realization  d(-)  E  @t  so  as  to  prevent  the  control  from  doing  so.  An  HJB 
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Figure  3.3:  Sample  simulation  run  of  two  aircraft  conflict  resolution  example. 


equation  encoding  this  terminal  cost  problem  is  given  by 

^+h(x,^\  —  0,  0(*,O)  =  0G(*),  (3.11) 

with  the  optimal  Hamiltonian 

H  (x,p)  —  maxpT f(qi,x,u,d).  (3.12) 

deD 

Let  0  be  the  unique  viscosity  solution  to  (3.11),  then  by  an  application  of  the  results  in  Evans  and 
Souganidis  (1984),  it  follows  that 

@fa(G)  =  {x  e  X,<fr(x, -T)  <  0}  . 

Given  a  target  set  G\  C  X  and  an  avoid  set  Gi  C  X,  consider  a  one  step  reach-avoid  operator 
for  mode  qi  under  input  u  as  defined  by 

@s/fs{Gi,G2)  =  {x0eX:Vd(-)  e  e  G\)  a (x(t)  i  g2,  Vt  e  [0,r])}. 
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From  the  definitions  of  the  operators  sfj"u  and  it  can  be  inferred  in  a  straightforward 
manner  that 


@&/fa(G\,G2)  =  &$,a(Gi)  n  (^5“(G2))C. 

Now  consider  a  one  step  reach-avoid  operator  for  the  switched  system  M’sw  as  defined  by 

(Gf  ,Gf)  —  {(<?,*)  '■  3(cr  ,u)  G  E  x  U,\/d  G  @t,  (3.13) 

((q(T),x(T))eG?)AMt),x(t))tG$,  Vf  G  [0,7’])}, 

where  Gf  =  [J’L ,  {</;}  x  G\j  and  Gf  =  lj"i1  {q,}  x  G2.,  are  subsets  of  2  x  X.  In  other  words,  this 
is  the  set  of  states  reachable  to  G f  at  the  end  of  a  sampling  interval,  while  avoiding  Gf  throughout. 
The  following  result  provides  a  charaterization  of  this  operator  in  terms  of  a  combination  of  discrete 
and  continuous  reachability  computations. 

Lemma  3.2.  Let  Gf  =  (J/=i  {<?/}  x  Gy,  C  QxX  and  Gif  =  (J;=i  {<?;'}  x  G2(-  C  QxX.  Then 

Wf  (Gf,Gf)  =  1J  {<7/}  x  (G2,/)cn  I  1J  (J  Wf(Gw,G2J)  J  .  (3.14) 

c/i^Q  \qjeReach(qj)ueU  J 

Proof.  For  notational  conveniences,  we  define  VH  =  (J/=i  {<?/}  x  (G2.;)c  fl  V/,  where 

Vi=  U  U*<J'fi(GU'G2j)- 

qjeReach(qi)  ueU 


Let  (g/,jc)  G  ,  Gf ).  By  the  definition  in  (3.13),  there  exists  a  choice  of  controls  (<J,u)  G 

E  x  U  such  that  for  every  choice  of  disturbance  realization  d  G  &■/■,  the  one  step  trajectory  of  M'sw 
initialized  at  (<?,-, jc)  satisfies  (q(T),x(T))  G  Gf  and  (q(t),x(t))  f  Gf ,  Vt  G  [0,  T],  Let  qj  =  8(qi,a), 
then  this  implies  that  jc  ^  G2  /  and  jc  G  &&/t’u(Gij,G2j),  and  hence  x  G  (G2j/)c  fl  V).  Thus,  we 
have  ^.c/f  (Gf ,  Gf )  C  VH . 

In  order  to  prove  the  reverse  inclusion,  consider  a  state  (q,, x)  G  VH .  Then  by  the  definition 
of  VH,  x  f  G2  /  and  there  exists  qj  G  Reach(qi)  and  u  G  U  such  that  jc  G  ^’“(Gij,G2j-).  Let 
a  G  E  be  a  discrete  command  such  that  qj  =  8  ( qj ,  (J) .  Then  under  the  choice  of  controls  (a, u),  the 
trajectory  of  Jfsw  starting  from  (qt,x)  satisfies  (q(T),x(T))  G  Gf  and  (q(t), x(t))  f  Gf ,  Vf  G  [0,  T], 
regardless  of  any  admissible  disturbance  realization.  Thus,  {qi-x)  G  (Gf  ,  Gf ),  from  which 

it  follows  that  VH  C  ffg/j  ( Gf ,  Gf ) .  □ 

Now  consider  Algorithm  3.5.1  for  computing  a  horizon-A  reach-avoid  set  under  quantized 
control  policies. 

Proposition  3.3.  Given  a  sampled-data  switched  system  Jtfsw,  a  target  set  RH  C  QxX  and  an 
avoid  set  Ah  C  QxX,  let  Sf  he  the  output  of  Algorithm  3.5.1.  Then  Sf  is  a  horizon-N  reach- 
avoid  set. 
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Algorithm  3.5.1  Computation  of  Finite  Horizon  Reach-avoid  Set 
Require:  RH  ,AH  c  Q  x  X 
1:  S%  <=Rh\Ah 
2:  for  j  =  0  to  AT**- 1  do 
3:  S^+1  ^^^(Sf,AH)USf; 

4:  end  for 
5:  return  Sf ,  S 


Proof.  Similarly  as  in  the  proof  of  Proposition  3.1,  we  will  prove  the  following  statement  by 
backwards  induction  on  k:  there  exists  pk^N  £  ■/^k-^N  such  that  for  every  initial  condition  (q.x)  G 
S%_k\RH  and  yk  G  the  trajectory  of  Jfsw  satisfies  (q((k+  1  )T),x((k  +  1)7))  G  S%_k, 

(q(jT),x(jT))  G  Rh  for  some  j  e  {k  +  1,  and  (q(t), x(t))  ^AH,  At  G  [kTJT],  The  statement 
of  the  proposition  again  follows  from  the  case  of  k  —  0. 

First,  for  k  =  A  —  1,  we  have  Sf  —  MtA^ASq  .Ah)  U  Sq  ,  where  Sq  =  RH  \AH .  By  the  def¬ 
inition  of  Mx/Aj  in  (3.13),  for  every  {q.x)  G  &sAy,  (Sq  ,AH),  there  exists  a  choice  of  controls 
(a  gEx(/  such  that  for  every  disturbance  realization  d  G  -At,  the  closed-loop  trajectory 

satisfies  (q(NT),x(NT))  G  RH  and  (q(t),x(t))  f.  AH ,  At  G  [( A  —  1)7, AT].  Let  p^_ l  be  any  one 
step  policy  which  satisfies  pf_\  (q,x)  —  {o,u)^qxy  A(q,x)  G  AAszAj(Sq,Ah),  then  has  the 
required  properties. 

Next,  suppose  that  the  induction  hypothesis  holds  for  some  j  G  (1,2, ....  A  —  1}.  Then  there 
exists  a  reach-avoid  control  policy  pj^N  =  {pj,  Pj+i,  •••, Pn-i)  with  respect  to  S^_  ■.  Further¬ 
more,  under  the  one  step  policy  pj,  the  closed-loop  trajectory  starting  from  any  initial  condi¬ 
tion  in  S^_j\Rh  satisfies  (q((j  +  l)7),x((y  +  1)7))  G  Sjy  •.  Now  consider  an  initial  condition 
(q,x)  G  AAsAy  (S^_j.Ah).  By  (3.13),  there  exists  a  choice  of  controls  {o,u)^qx^  G  E  x  U  such  that 
the  one  step  trajectory  of  Afsw  satisfies  (q((j  +  1)7),jc((j  -h  1)7))  G  S^_j  and  (q(t), x(t))  f  AH , 
At  G  [jT.  (j+  1)7].  Choose  a  one  step  policy  p*_ ,  as  follows: 

.-.*  (nr\  =  f  A M,*),  M  e 

J  \(<j,s)m,  («,*)es«_;+1\sjj_j. 

Then  pj  \ _^,y  =  (pj_\  ■Pj-^x)  is  a  control  policy  with  the  required  properties.  The  desired  result 
then  follows  by  induction.  □ 

3.5.2  Reach-avoid  Control  Policy 

As  in  the  case  of  the  safety  control  problem,  one  can  derive  an  explicit  representation  of  the  reach- 
avoid  control  policy  from  the  reachability  computation  in  Algorithm  3.5.1.  In  particular,  suppose 
that  Sq  =  Rh  \Ah  0,  we  define  a  function  km\„  :  — >  (0, 1, ...,  A}  by 


kmin(q,x)  —  min {;  G  (0,1,..., A}  :  (q,x)  eSf}. 


60 


This  can  be  interpreted  as  the  minimum  time  to  reach  at  a  feasible  initial  condition  {q,x)  G  50 , 
with  respect  to  the  quantized  control  policy  space  . 

At  a  state  ( q,x )  G  50  \50,  consider  the  set  of  feasible  control  inputs  for  the  reach-avoid  problem 
as  defined  by 

FRA(q,x)  =  {(ct,m)  G  E  x  {7  :  (3.15) 

It  can  be  checked  in  a  straightforward  manner  that  this  set  is  nonempty  for  every  (q,x)  G  50  \  50 . 
For  (q,x)  G  50,  we  define  FRA(q1x)  =  E  x  C7. 

Proposition  3.4.  Let  50,  j  =  1  ,  ...,N  be  the  j-step  reach-avoid  sets,  as  computed  through  Algo¬ 
rithm  3.5. 1.  IfRH  \AH  f  0,  f/ien  r/ny  control  policy  jl  G  which  satisfies 

£*(«>*)  =  F^jq^x),  V(<?,v)  G  50,  (3.16) 

/or  k  =  0, 1, ..., N  —  1,  is  a  horizon-N  reach-avoid  control  policy  with  respect  to  50. 

Proof.  Let  jl  G  -//  be  any  control  policy  which  satisfies  (3.16).  We  prove  the  following  statement 
by  forward  induction  on  k:  for  any  initial  condition  (</(0),jc(0))  G  50  and  disturbance  strategy 
y  G  r,  the  trajectory  of  M'sw  on  [0,  kT ]  satisfies  at  least  one  of  the  following  conditions: 

1.  3/  <  k,  (q(lT),x(lT))  G  50  and  (q(t),x(t))  fAH,  Vf  G  [0 ,IT]\ 

2.  (, q(kT),x(kT ))  G  50_/Sr  and  (q(t),x(t))  fAH,  Vf  G  [0,LT]. 

The  proposition  then  follows  from  the  case  of  k  =  A. 

Let  (<7(0),jc(0))  G  50  and  y  G  T.  For  k  —  0,  we  have  by  the  definition  of  the  operator  Ffszfj 
in  (3.13)  and  the  set  50  in  Algorithm  3.5.1  that  50  C  ( AH)C .  For  the  inductive  step,  we  as¬ 
sume  that  either  condition  1  or  condition  2  holds  for  some  j  G  (0, 1,  1}.  If  condition 

1  holds  for  the  trajectory  on  [0,  jT],  then  clearly  this  condition  also  holds  for  the  trajectory  on 
[0,  (j  +  1)7"].  Otherwise,  condition  2  holds,  which  implies  that  (q(jT),x(jT))  G  50  -\50.  Let 
k()  =  kmm(q{jT).x(  jT)).  From  the  definition  of  km m,  we  can  infer  that  0  <  ko  <  N  —  j  and  that 
£sk0\sk0-v  Thus’  (<lUT),*UT))  G^,^/0(50_1,A//),andfor  any  choice  of  con¬ 
trol  (<r j,Uj)  G  FRA(q(jT)1x(jT)),  the  resulting  one  step  trajectory  satisfies  (q((j  +  l)T),x((j  + 
1)7"))  G  50_j  and  (g(f),x(f))  ^  Aw,  Vf  G  [jT,  (j  +  1)7"] .  By  the  assumption  on  the  control  policy 
jl  and  the  observation  that  50_j  C  50  •_•[,  it  then  follows  that  condition  2  holds  on  [0,  (j  +  1)7"]. 
The  result  then  follows  by  induction.  □ 

Similarly  as  in  the  safety  problem,  we  can  compute  the  collections  of  level  set  functions  rep¬ 
resenting  50  using  Algorithm  3.5.1.  These  functions  can  be  then  stored  as  lookup  tables  for  the 
online  extraction  of  control  inputs,  for  example,  according  to  Algorithm  3.5.2. 

Similarly  as  in  the  implementation  of  the  safe  control  policy,  checking  the  set-membership 
conditions  in  Algorithm  3.5.2  is  equivalent  to  checking  inequality  conditions  with  respect  to  level 
set  representations  of  the  sets  50  and  dfs^j,u(SIJ__l(q,),AH(q/)). 
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Algorithm  3.5.2  Online  Implementation  of  Finite  Horizon  Reach-avoid  Control  Policy 

Require:  S*j,  j  =  1  (q(0),x(0))  G  S% 

1:  for  k  —  0  to  N  —  1  do 
2:  Measure  state  (, q(kT),x(kT )) 

3:  if  (q(kT),x(kT))  e  RH \AH  then 

4:  Terminate  algorithm; 

5:  else 

6:  F^  ^  0 

7:  Find  minimum  j  such  that  ( q(kT),x(kT ))  G 

8:  for  all  ( (7,  u)  G  E  x  (7  do 

9:  4=  8{q{kT):  O'); 

10:  if  .v(FT)  G  ^%£/‘^,u(SIj_l(qr),AH (qr))  then 

11:  Add  (<7,  w)  to  F^; 

12:  end  if 

13:  end  for 

14:  Apply  input  (ok,  uk)  G  F^; 

15:  end  if 

16:  end  for 


3.6  Experimental  Results 

In  this  section,  we  will  discuss  an  experimental  application  of  the  proposed  controller  synthesis 
algorithms  to  a  quadrotor  helicopter  platform  -  the  Stanford  Testbed  of  Autonomous  Rotorcraft 
for  Multi-Agent  Control  (STARMAC).  For  a  comprehensive  overview  of  the  development  of  this 
platform  and  its  aerodynamic  modeling,  the  interested  reader  may  refer  to  Hoffmann  et  al.  (2007). 
In  our  experiments,  a  hover  control  problem  is  considered,  with  the  objective  of  controlling  a 
quadrotor  helicopter  to  reach  a  hover  region  over  a  stationary  or  moving  ground  target,  while 
satisfying  a  velocity  constraint,  and  then  remain  within  the  hover  region,  regardless  of  possible 
movements  by  the  ground  target  (see  Figure  3.4). 

Given  a  previously  designed  inner  attitude  control  loop,  the  hover  control  problem  involves 
the  selection  of  pitch  and  roll  angles  in  order  to  effect  changes  in  the  position  and  velocity  of 
the  quadrotor.  In  particular,  the  pitch  and  roll  commands  are  selected  from  a  discrete  set,  thus 
resulting  in  a  switching  control  problem.  Under  these  commands,  the  relative  dynamics  between 
the  quadrotor  and  the  ground  target  can  be  modeled  as  follows. 


Xl 

X2  +  di 

x2 

gsin(<j>)  +d2 

y\ 

y2  +  d3 

.  y* . 

gsin(— 0)  +d4 

Here,  the  state  variables  xi,  X2  and  y\,  yi  denote  the  relative  position  and  velocity  between  the 
quadrotor  and  the  ground  target,  in  the  x  and  y  directions,  respectively;  (j)  and  0  are  the  roll  and 
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Figure  3.4:  Setup  of  hover  control  experiments.  Here  the  ground  target  is  a  radio-controlled  car. 


pitch  angles  of  the  quadrotor;  g  denotes  the  gravitational  constant;  d  —  (d\  ,  c/2,  c/3,  c/4)  is  a  set  of 
disturbance  parameters.  In  particular,  d\  and  c/3  are  used  to  capture  the  effects  of  unmodelled 
dynamics,  while  c/2  and  c/4  represent  motor  noise  and  also  the  acceleration  of  the  ground  target. 

In  the  experiment  scenario,  the  hover  region  can  be  encoded  as  a  set  WH  C  M4  centered  around 
the  origin  in  the  relative  position-velocity  space.  Similarly,  the  velocity  constraint  can  be  encoded 
as  an  avoid  region  AH  cl4.  A  brief  summary  of  the  problem  parameters  is  given  below. 

•  Hover  Region  (WH):  |jci|,|yi|  <  0.3  m,  |jt2 1 -  |y2  <  0.5  m/s 

•  Avoid  Region  ( AH ):  \x2\,  \yi\  >  1  m/s 

•  Time  Step  ( T ):  0.1  seconds 

•  Time  Horizon  for  Reaching  WH  ( N ):  25  time  steps 

•  Range  of  Attitude  Commands  (0,  6):  -10,  -7.5,  -5,  -2.5,  0,  2.5,  5,  7.5,  10  degrees 

•  Disturbance  Bounds:  \d\\,  c/3  <0.1  m/s,  \do\,  \d$\  <  0.5  m/s2 

It  is  important  to  note  that  the  choice  of  disturbance  bounds  is  a  trade-off  between  the  level  of 
robustness  and  the  feasibility  of  the  control  problem.  Although  a  larger  disturbance  bound  may 
account  for  a  wider  range  of  uncertainties,  the  feasible  set  for  the  controller  would  in  general  also 
be  smaller  (possibly  empty  if  the  bound  is  sufficiently  large).  The  bounds  given  here  for  d\,d^ 
represent  about  ±10%  of  the  maximum  allowed  velocity,  while  the  bounds  on  c/2,  c/4  represent 
about  ±30%  of  the  maximum  allowed  acceleration. 

We  observe  that  the  hover  control  problem  as  defined  above  is  a  particular  instantiation  of 
Problem  2.2  given  in  section  2.4.2  for  a  semiautomated  sequential  transition  system.  In  particular, 
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the  problem  can  be  separated  into  two  stages.  During  the  first  stage,  the  objective  is  to  reach  the 
hover  region  WH  in  finite  time  while  avoiding  AH  (a  reach-avoid  problem).  During  the  second 
stage,  the  objective  is  to  remain  inside  the  hover  region  WH  (an  invariance  problem).  Thus,  we  can 
employ  a  design  procedure  that  is  specialized  from  the  one  given  in  section  2.6.2. 

1.  Use  the  method  in  section  3.4.3  to  compute  an  infinite  horizon  safe  set  GJaj-e  with  respect  to 
WH  and  an  infinite  horizon  safe  control  policy  FSafe  with  respect  to  G^aje. 

2.  If  G^aje  /  0,  choose  a  target  set  RH  C  GrRuf(J. 

3.  Use  the  method  in  section  3.5  to  compute  a  horizon-A  reach-avoid  set  GhRA  with  respect  to 
Rh  and  Ah,  as  well  as  a  horizon-A  reach-avoid  control  policy  with  respect  to  G^. 

4.  Choose  a  switching  policy  as  follows.  In  stage  1,  select  control  inputs  according  to  F ^  until 

(q(kT),x(kT))  G  RH  for  some  k  e  {0, 1,  then  switch  to  FSafe. 

Following  this  procedure,  we  first  compute  an  infinite  horizon  safe  set  G^aje  C  WH  for  which 
the  hover  objective  is  feasible.  This  set  is  plotted  in  Figure  3.5.  For  the  finite  horizon  reach-avoid 
problem,  we  select  the  target  set  as  RH  =  { (jci , JC2, yi , ^2) :  |*t|,  |yi|  <  0.2m,  |jc2| ,  \yi\  <  0.2m/s}  C 
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Figure  3.5:  Infinite  horizon  safe  set  (dashed  line)  computed  for  hover  objective.  Inner  rectangle  is 
the  target  region  chosen  for  reach-avoid  problem. 

Using  the  procedures  given  in  section  3.5,  we  then  compute  the  set  of  initial  conditions  which 
can  reach  the  target  set  RH  within  the  time  horizon  of  interest,  while  satisfying  the  velocity  con¬ 
straint  Ah .  Some  examples  of  the  sets  ,  j  =  1,2,..., 25,  as  generated  by  Algorithm  3.5.1,  are 
plotted  in  Figure  3.6. 
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Figure  3.6:  Finite  horizon  reach-avoid  sets  (dashed  lines):  (a)  sets  Sy  (inner-most  line)  through 
Sf0  (outer-most  line);  (b)  sets  (inner-most  line)  through  5^  (outer-most  line). 


The  level  set  representations  of  the  safe  sets  and  reach-avoid  sets,  as  computed  in  an  offline 
setting,  are  stored  in  lookup  table  form  in  the  on-board  computer.  During  the  experiments,  the 
online  selection  of  control  inputs  are  carried  out  through  an  implementation  of  Algorithms  3.4.2 
and  3.5.2.  In  particular,  we  obtain  sampled  measurements  of  the  quadrotor  and  ground  target 
positions  from  a  VICON  camera  system.  The  position  measurements  are  used  to  estimate  the 
velocity  through  a  first  order  finite  difference  scheme.  These  state  values  are  then  used  to  compute 
the  relative  states  and  select  the  appropriate  pitch  and  roll  commands  by  checking  containment  of 
the  current  state  in  particular  safe  sets  or  reach-avoid  sets.  As  discussed  in  sections  3.4.2  and  3.5.2, 
this  check  can  be  performed  by  checking  inequalities  with  respect  to  level  set  representations  of 
stored  reachable  sets.  Given  that  the  VICON  system  resolves  positions  to  the  order  of  10  3m,  the 
assumption  on  precise  state  measurements  is  reasonably  accurate  for  these  experiments. 

The  results  of  an  experimental  trial  in  which  the  ground  target  is  stationary  is  shown  in  Figure 
3.7.  Here  the  quadrotor  is  initialized  at  a  state  (xi,X2,yi,y2)  =  (1,0, 1.1,0)  m,  inside  the  reach- 
avoid  set  5^ ,  with  the  ground  target  placed  at  the  origin.  In  the  first  stage  of  the  experiment,  the 
reach-avoid  controller  is  shown  to  drive  the  system  trajectory  inside  the  target  set  RH  within  about 
1.8  seconds  (the  allowed  time  horizon  is  2.5  seconds),  without  exceeding  the  admissible  velocity 
bounds  of  ±1  m/s.  For  the  second  stage  of  the  experiment,  the  safety  controller  is  shown  to  keep 
the  system  state  within  the  hover  region  WH  for  almost  the  entire  remaining  33  seconds  of  the 
experiment,  except  for  a  brief  violation  of  about  0.2  seconds  in  duration.  The  violation  can  be  in 
part  attributed  to  an  observed  lag  in  system  response  under  attitude  control  commands,  which  can 
be  accounted  for  either  through  a  higher  order  system  model  or  by  enlarging  the  disturbance  bound 
estimates  reported  here. 

From  a  plot  of  the  attitude  commands  issued  during  the  first  5  seconds  of  this  experiment 
(see  Figure  3.8),  it  can  be  observed  that  the  reach-avoid  controller  has  the  characteristics  of  a 
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minimum- time-to-reach  controller.  Namely,  an  aggressive  acceleration  action  is  applied  until  the 
state  trajectory  approaches  the  velocity  constraint,  after  which  an  aggressive  deceleration  action  is 
applied  as  the  quadrotor  nears  the  origin.  On  the  other  hand,  during  the  hover  phase,  the  safety 
controller  has  the  characteristics  of  a  least  restrictive  controller.  Namely,  it  intervenes  only  when 
there  is  a  possibility  that  the  state  trajectory  will  exit  the  hover  region. 
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Figure  3.7:  Results  from  hover  control  experiment  over  35  Seconds:  (a)  x-position  (m)  and 
x-velocity  (m/s)  trajectory  of  STARMAC;  (b)  y-position  (m)  and  y-velocity  (m/s)  trajectory  of 


STARMAC. 


The  trajectory  plot  from  an  experimental  trial  with  a  moving  ground  vehicle  is  shown  in  Fig¬ 
ure  3.9.  In  this  case,  the  quadrotor  first  reaches  the  hover  region  over  the  ground  vehicle  within 
2. 1  seconds  and  then  proceeds  to  track  the  unplanned  movements  of  the  ground  vehicle  over  the 
course  of  approximately  44  seconds.  The  results  show  that  the  quadrotor  vehicle  indeed  remains 
within  the  hover  region,  except  for  two  brief  violations  due  to  occasional  bursts  of  acceleration 
by  the  ground  vehicle  not  accounted  for  in  the  disturbance  bound  estimates.  As  in  the  previous 
experiment,  the  hover  region  is  quickly  recovered  within  0.1s  and  0.6s,  respectively,  using  the 
reach-avoid  control  law  (3.16). 


3.7  Application  to  Sequential  Reachability  Problems 

In  order  to  illustrate  how  the  controller  synthesis  procedure  described  in  this  chapter  can  be  applied 
to  problems  with  sequential  reachability  objectives,  we  will  revisit  in  this  section  the  example 
of  automated  aerial  refueling  (AAR)  as  introduced  in  Section  2.8.  In  particular,  we  consider  the 
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Attitude  Control  Commands  Over  5  Seconds 


Figure  3.8:  Control  input  plots  for  hover  control  experiment  over  5  second  interval. 


x-Position  vs  y-Position  Over  44  Seconds  x-Position  vs  y-Position  Over  1 9.5  Seconds 


Figure  3.9:  Results  from  car  following  experiment:  (a)  x-Position  (m)  and  y-Position  (m)  trajecto¬ 
ries  of  STARMAC  and  ground  vehicle  over  44  second;  (b)  Snapshot  of  trajectories  at  /  =  19.5. 


problem  of  synthesizing  the  control  law  for  each  maneuver  in  the  refueling  sequence  as  a  switching 
control  policy  between  a  finite  set  of  flight  modes. 

More  specifically,  we  assume  the  system  dynamics  and  the  specifications  of  target  sets  and 
avoid  sets  as  given  in  Section  2.8  of  the  preceding  chapter.  For  each  maneuver  in  the  refueling 
sequence,  we  will  use  the  two-mode  flight  control  system  from  section  3.2  for  the  synthesis  of 
UAV  controls  (see  Figure  3.1)  to  satisfy  the  reach-avoid  objective.  In  the  straight  mode,  the  linear 
velocity  bounds  are  given  by  [m1;m i]  =  [40,113]  m/s,  with  three  quantization  levels;  while  in 
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the  turn  mode,  the  angular  velocity  bounds  are  given  by  [u2,u 2]  =  [— 7r/6,  7t/6]  rad/s,  with  two 
quantization  levels. 

Now  consider  the  sequential  reachability  problem  with  reach-avoid  objectives  (Problem  2.1), 
within  the  context  of  AAR.  The  control  design  can  be  carried  out  as  a  specialization  of  the  proce¬ 
dure  outlined  in  section  2.6.1,  starting  with  the  Rejoin  maneuver  (/  =  6). 

1.  Use  the  method  in  section  3.5  to  compute  a  reach-avoid  set  G^  with  respect  to  Rj  and  Aj, 
until  the  first  integer  Aj  such  that  Rj~\  C  G  RA . 

2.  Use  equation  (3.16)  to  synthesize  a  reach-avoid  control  policy  F^A  with  respect  to  GNR'A. 

3.  Repeat  the  above  steps  for  maneuver  j  —  1  until  the  Detach  1  maneuver  (  j  —  1).  For  Detach 
1,  set  Ro  —  Ao. 

4.  Choose  a  switching  policy  as  follows.  In  maneuvers  j  =  0,1,..., 5,  select  control  inputs 
according  to  F until  (q(kT),x(kT))  G  Rj  for  some  k  G  (0, 1, ...,  Aj},  then  reset  k  to  zero 
and  switch  to  F^A{ . 

A  reach-avoid  set  calculation  is  performed  using  Algorithm  3.5.1,  with  sampling  interval  T  — 
0.1  seconds,  for  each  of  the  refueling  maneuvers.  The  reach-avoid  set  for  the  Contact  maneuver 
is  shown  in  Figure  3.10,  computed  over  a  time  horizon  of  2.1  seconds.  Note  that  for  convenience, 
we  translated  the  target  set  to  the  origin  in  these  computations. 


Relative  Angle  xg  =  0  degrees 


Figure  3.10:  Finite  horizon  reach-avoid  set  for  Contact  maneuver:  (a)  surface  plot  in  relative 
coordinate  space;  (b)  cross-section  at  relative  angle  V3  =  0  degrees. 


To  validate  the  resulting  reach-avoid  sets,  a  trajectory  simulation  is  performed  of  the  entire 
aerial  refueling  sequence.  At  the  beginning  of  each  sampling  interval,  the  UAV  receives  a  state 
measurement  of  the  relative  position  and  velocity  and  selects  a  control  input  according  to  the 
feedback  law  (3.16).  The  tanker  aircraft  then  selects  a  random  velocity  input  from  within  its  input 
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bounds.  The  system  dynamics  is  then  integrated  forward  in  time  according  to  equation  (2.8). 
The  simulation  results  are  given  in  Figure  3.11,  and  the  plot  of  the  state  trajectory  in  the  relative 
coordinate  space  is  shown  in  Figure  3.12. 


Figure  3.11:  Automated  aerial  refueling  sequence  simulation  sample  run. 

As  can  be  seen,  the  UAV  successfully  avoids  a  collision  with  the  tanker  aircraft,  regardless  of 
the  random  fluctuations  of  tanker  velocity,  and  completes  the  entire  refueling  sequence  (excluding 
the  time  spent  refueling)  within  7  seconds.  Although  not  investigated  here,  extensions  to  invariance 
objectives  can  be  carried  out  by  designing  infinite  horizon  safety  controllers  with  respect  to  the 
target  neighborhoods  Wj,  and  ensuring  compatibility  between  the  stationary  and  transition  modes 
through  a  specialization  of  the  procedure  given  in  section  2.6.2. 
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Figure  3.12:  Refueling  sequence  trajectory  simulation  in  relative  coordinate  space:  (a)  side  view 
(b)  top-down  view. 
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Part  II 

Discrete  Time  Stochastic  Hybrid  Systems 
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Chapter  4 


Stochastic  Game  Formulation  of 
Probabilistic  Reachability 

4.1  Overview  and  Related  Work 

In  the  second  part  of  the  dissertation,  we  will  shift  our  focus  to  probabilistic  reachability  problems 
for  stochastic  hybrid  systems  (SHS).  The  primary  difference  between  a  stochastic  hybrid  system 
as  compared  with  a  deterministic  hybrid  automaton  lies  in  the  model  of  uncertainty.  In  the  case 
of  a  deterministic  hybrid  system  model,  such  as  the  one  discussed  in  section  2.3.1,  uncertainty  in 
system  dynamics  is  captured  through  the  notion  of  sets.  In  particular,  the  set  of  admissible  distur¬ 
bance  inputs  as  specified  by  the  disturbance  input  spaces,  along  with  the  set  of  admissible  discrete 
transitions  as  specified  by  the  reset  relation,  implicitly  define  a  set  of  admissible  system  trajectories 
under  a  particular  choice  of  initial  condition  and  control  input.  In  the  case  of  a  stochastic  hybrid 
system  model,  such  as  proposed  in  Altman  and  Gaitsgory  (1997);  Hu  et  al.  (2000);  Bujorianu  and 
Lygeros  (2004);  Amin  et  al.  (2006),  uncertainty  in  state  evolution  is  captured  through  the  notion 
of  probability  distributions.  Speaking  somewhat  informally,  through  the  introduction  of  transition 
probabilities,  stochastic  differential/difference  equations,  or  transition  rates,  one  implicitly  defines 
a  probability  distribution  over  the  set  of  possible  executions  of  a  hybrid  system. 

From  a  theoretical  standpoint,  probabilistic  models  can  be  viewed  as  a  generalization  of  deter¬ 
ministic  models.  Namely,  the  support  of  a  probability  distribution  can  be  interpreted  as  the  set  of 
possible  outcomes,  while  the  distribution  itself  can  be  interpreted  as  a  quantitative  measure  of  the 
likelihood  of  possible  outcomes.  It  is  then  tempting  to  conclude  that  deterministic  systems  can  be 
studied  as  special  cases  of  stochastic  systems.  In  practice,  however,  analysis  and  computational 
tools  developed  for  stochastic  systems  often  does  not  specialize  directly  to  deterministic  systems, 
due  to  the  fact  that  deterministic  dynamics  result  in  degenerate  transition  probabilities  (i.e.  proba¬ 
bility  distributions  with  mass  concentrated  at  a  single  point).  Furthermore,  even  if  it  were  possible 
in  certain  instances  to  adapt  stochastic  techniques  to  deterministic  systems,  the  process  of  doing  so 
may  overcomplicate  the  analysis  and  obscure  the  intuition  behind  deterministic  problems.  Thus, 
the  methods  that  will  be  presented  for  addressing  probabilistic  reachability  problems  should  not  be 
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interpreted  as  a  generalization  of  methods  addressing  deterministic  reachability  problems.  Rather, 
they  should  be  viewed  as  an  adaptation  of  reachability  analysis  to  different  application  scenarios. 

Within  an  application  context,  there  are  several  possible  reasons  for  employing  a  probabilistic 
model  as  compared  with  a  deterministic  model.  For  cases  in  which  the  possible  variations  in 
system  behavior  is  not  known  a  priori  or  cannot  be  conservatively  estimated,  it  may  be  necessary 
to  model  the  uncertainties  through  statistical  analysis  of  empirical  data,  collected  from  multiple 
runs  of  the  system.  The  resulting  model  would  then  have  a  probabilistic  interpretation,  namely 
the  probability  distribution  over  the  set  of  possible  executions  is  a  quantitative  measure  of  the 
likelihood  that  subsets  of  trajectory,  referred  to  as  events,  will  occur.  There  are  also  cases  in  which 
the  disturbances  affecting  system  dynamics  are  known  to  fluctuate  within  a  large  range  (e.g.  wind 
effects  on  aircraft  trajectory).  For  such  cases,  if  one  were  to  design  controllers  with  respect  to  the 
worst-case  realizations  of  the  disturbances,  the  resulting  controllers  may  be  overly  conservative. 
To  reduce  the  conservatism,  one  may  consider  adopting  a  probabilistic  disturbance  model,  which 
provides  an  estimate  for  the  likelihood  of  possible  disturbance  events.  Controller  design  can  be 
then  carried  out  with  respect  to  probabilistic  measures  of  performance.  Finally,  if  one’s  objective 
is  to  model  the  aggregate  behavior  of  a  large  scale  system,  for  example  the  behavior  of  economic 
indices,  then  a  natural  modeling  framework  would  be  that  of  a  stochastic  system.  In  such  cases, 
the  quantitative  values  of  the  state  variables  are  generated  by  the  outcomes  of  a  large  number  of 
concurrent  dynamic  processes  (e.g.  economic  activities).  As  a  complete  deterministic  description 
of  such  processes  would  be  intractable  for  analysis  and  decision  making,  a  statistical  model  is  often 
employed  instead.  In  particular,  when  a  variable  of  interest  corresponds  to  a  sum  or  an  average  of 
quantities  generated  by  the  underlying  processes,  then  a  Gaussian  model  is  a  reasonable  first  order 
approximation  by  the  Central  Limit  Theorem. 

In  the  hybrid  systems  literature,  stochastic  models  have  been  proposed  for  application  scenar¬ 
ios  ranging  from  air  traffic  management  (Glover  and  Lygeros,  2004),  communication  networks 
(Hespanha,  2004),  to  systems  biology  (Hu  et  al.,  2004).  For  a  controlled  SHS,  the  performance  of 
the  closed-loop  system  can  be  measured  in  terms  of  the  probability  that  the  system  trajectory  obeys 
certain  desired  specifications.  Of  interest  to  safety-critical  applications  are  probabilistic  safety  and 
reachability  problems  in  which  the  control  objective  is  to  maximize  the  probability  of  remaining 
within  a  certain  safe  set  or  of  reaching  a  desired  target  set.  In  the  continuous-time  case,  a  theo¬ 
retical  upper  bound  on  the  reachability  probability  is  derived  in  Bujorianu  (2004)  using  Dirichlet 
forms.  The  temporal  evolution  of  the  probability  density  function  of  the  hybrid  state  has  been 
characterized  through  generalized  Fokker-Planck  equations  (Beet  et  al.,  2006).  Optimal  control  of 
stochastic  hybrid  systems  is  considered  in  Bensoussan  and  Menaldi  (2000)  and  quasi-variational 
inequalities  based  on  dynamic  programming  are  derived  for  the  optimal  trajectory.  An  optimal 
control  approach  towards  reachability  analysis  is  discussed  in  Koutsoukos  and  Riley  (2006)  and 
Mohajerin  Esfahani  et  al.  (2011),  in  which  the  solutions  of  probabilistic  safety  and  reachability 
problems  are  derived  in  terms  of  the  viscosity  solutions  of  appropriate  Hamiltion-Jacobi-Bellman 
equations.  To  address  the  computational  issues  associated  with  probabilistic  reachability  analysis, 
the  authors  in  Hu  et  al.  (2005)  propose  a  Markov  chain  approximation  of  the  SHS  using  methods 
from  Kushner  and  Dupuis  (1992),  while  in  Prajna  et  al.  (2007),  the  authors  discuss  an  approach 
for  computing  an  upper  bound  on  the  safety  probability  using  barrier  certificates.  For  discrete-time 
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stochastic  hybrid  systems  (DTSHS),  a  theoretical  framework  for  the  study  of  probabilistic  safety 
problems  is  established  in  Abate  et  al.  (2008).  These  results  are  generalized  in  Summers  and 
Lygeros  (2010)  to  address  the  reach-avoid  problem,  in  which  the  control  objective  is  to  reach  a  de¬ 
sired  target  set,  while  remaining  within  a  safe  set.  Considerations  for  time-varying  and  stochastic 
sets  are  discussed  in  Abate  et  al.  (2006)  and  Summers  et  al.  (2011)  respectively. 

Recently,  we  extended  the  probabilistic  safety  and  reachability  of  DTSHS,  as  studied  in  Abate 
et  al.  (2008)  and  Summers  and  Lygeros  (2010),  to  a  zero-sum  stochastic  game  setting  (Kamgarpour 
et  al.,  2011).  In  particular,  we  considered  a  scenario  in  which  the  evolution  of  the  system  state 
is  affected  not  only  by  the  actions  of  the  control  (as  in  previous  work),  but  also  by  the  actions 
of  a  rational  adversary,  whose  objectives  are  opposed  to  that  of  the  control.  This  is  motivated  by 
practical  applications  such  as  conflict  resolution  in  air  traffic  management  (Tomlin  et  al.,  2002)  and 
control  of  networked  systems  subject  to  external  attacks  (Amin  et  al.,  2009),  in  which  the  intent 
of  certain  rational  agents  may  be  uncertain.  In  addition,  the  framework  is  applicable  to  robust 
control  applications,  in  which  there  may  be  unmodeled  dynamics  whose  probability  distribution  is 
not  known  a  priori.  For  such  cases,  a  dynamic  programming  result  was  stated,  without  proof,  for 
determining  the  maximal  probability  of  satisfying  the  reach-avoid  objective,  subject  to  the  worst- 
case  adversary  behavior,  referred  to  as  the  max-min  reach-avoid  probability. 

The  discussions  of  this  chapter  is  a  significant  expansion  upon  the  basic  problem  formulation 
and  the  statement  of  the  dynamic  programming  result  given  in  Kamgarpour  et  al.  (2011).  In  terms 
of  problem  formulation,  a  formal  interpretation  is  given  for  the  max-min  probability  as  the  value 
of  a  zero-sum  Stackelberg  stochastic  game  with  the  control  as  the  leader.  We  then  provide  a 
detailed  proof  of  the  dynamic  programming  result  for  the  existence  and  computation  of  this  value. 
In  the  process  of  the  proof,  sufficient  conditions  of  optimality  are  derived  for  both  the  control 
and  the  adversary.  Furthermore,  it  is  briefly  discussed  how  this  result,  shown  for  the  case  of  the 
reach-avoid  problem,  can  be  specialized  to  address  the  safety  problem.  For  applications  with  less 
conservative  assumptions  on  the  disturbance,  we  also  investigate  the  implications  of  considering 
alternative  information  patterns  in  the  problem  formulation.  In  particular,  it  is  shown  that  the 
existence  of  value  under  symmetric  information  patterns  in  general  requires  randomized  player 
policies.  Finally,  we  discuss  in  detail  the  infinite  horizon  properties  of  the  dynamic  programming 
algorithms,  and  provide  some  results  on  the  computation  of  the  infinite  horizon  value  and  the 
existence  of  infinite  horizon  optimal  policies. 

In  comparison  with  existing  results  in  literature,  our  main  contributions  are  summarized  as 
follows.  First,  by  introducing  adversarial  inputs  into  the  system  model,  we  formulate  a  modeling 
framework  which  allows  analysis  of  hybrid  systems  with  both  stochastic  and  bounded  uncertain¬ 
ties.  Second,  through  our  dynamic  programming  result,  we  establish  a  basis  for  computational 
algorithms  addressing  probabilistic  safety  and  reachability  problems  posed  under  this  modeling 
framework.  Third,  the  proof  of  our  main  result  presents  a  generalization  of  the  stochastic  optimal 
control  arguments  employed  in  Abate  et  al.  (2008)  and  Summers  and  Lygeros  (2010)  for  single¬ 
player  probabilistic  safety  and  reachability  problems.  In  particular,  measurability  properties,  which 
are  vital  for  ensuring  that  the  probabilities  of  interest  can  be  computed  by  a  recursive  procedure, 
are  more  difficult  to  establish  in  a  stochastic  game  setting  as  compared  with  a  single -player  set¬ 
ting  (Nowak,  1985).  Thus,  our  dynamic  programming  arguments  require  the  use  of  results  from 
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the  analysis  of  zero-sum  stochastic  games  (Shapley,  1953;  Maitra  and  Parthasarathy,  1970;  Kumar 
and  Shiau,  1981;  Nowak,  1985;  Rieder,  1991;  Maitra  and  Sudderth,  1998;  Gonzalez-Trejo  et  al., 
2002),  with  adjustments  to  account  for  the  sum-multiplicative  form  of  our  utility  function  and  the 
asymmetric  information  pattern  in  a  max-min  control  problem. 

This  chapter  is  organized  as  follows.  In  section  4.2,  we  discuss  the  model  for  a  discrete-time 
stochastic  hybrid  game  (DTSHG).  In  section  4.3,  we  give  a  formal  stochastic  game  formulation 
of  the  probabilistic  reach-avoid  problem.  In  section  4.4,  we  state  and  prove  our  main  result  for 
computing  the  max-min  reach-avoid  probability,  and  give  sufficient  conditions  of  optimality  for 
both  the  control  and  the  adversary.  This  is  followed  by  the  specialization  of  this  result  to  the  safety 
problem.  In  section  4.5,  we  consider  the  implications  of  alternative  information  patterns  on  the 
existence  of  value  and  optimal  policies.  In  section  4.6,  we  discuss  the  extension  of  the  results 
to  infinite  horizon  reachability  problems.  In  section  4.7,  the  proposed  methodology  is  applied  to 
stochastic  formulations  of  the  target  tracking  and  aircraft  conflict  resolution  problems  as  considered 
in  chapter  3.  The  examples  are  used  to  illustrate  the  utility  of  stochastic  models,  the  computation 
of  max-min  probabilities  and  control  policies,  and  the  interpretation  of  the  dynamic  programming 
results  within  an  application  context. 

4.2  Discrete-Time  Stochastic  Hybrid  Game  Model 

The  model  for  a  discrete-time  stochastic  hybrid  game  (DTSHG)  as  described  in  this  section  is  an 
extension  of  the  discrete-time  stochastic  hybrid  systems  (DTSHS)  model  proposed  in  Abate  et  al. 
(2008);  Summers  and  Lygeros  (2010)  to  a  two-player  stochastic  game  setting.  As  in  previous  work, 
we  require  the  stochastic  transition  kernels  to  be  Borel-measurable  and  denote  by  itS(-)  the  Borel 
O'— algebra.  This  condition  ensures  that  the  probabilities  of  interest  can  be  computed  by  integration 
of  the  transition  kernels  over  a  hybrid  state  space.  Following  standard  conventions  in  two-player 
games,  we  refer  to  the  control  as  player  I  and  the  adversary  as  player  II. 

Definition  4.1  (DTSHG).  A  discrete-time  stochastic  hybrid  game  between  two  players  is  a  tuple 

—  (Q,n,Ca,Cb,Vx,Vq,vr),  defined  as  follows. 

•  Discrete  state  space  Q  {^i ,  <72,  ■■■,q>n},  m  £  N; 

•  Dimension  of  continuous  state  space  n  :  N:  a  map  which  assigns  to  each  discrete  state 

q  G  Q  the  dimension  of  the  continuous  state  space.  The  hybrid  state  space  is  given  by  S  := 

IW«}  x 

•  Player  I  controls  Ca :  a  nonempty,  compact  Borel  space; 

•  Player  II  controls  C\,\  a  nonempty,  compact  Borel  space; 

•  Continuous  state  transition  kernel  vx  :  CS(Wl{'^)  x  S  x  Ca  x  C/,  — >•  [0,1]:  a  Borel-measurable 

stochastic  kernel  on  M"0)  given  5xCflx  Q,  which  assigns  to  each  5  =  (q,x)  G  5,  a  G  Ca  and 
Z?  G  Q,  a  probability  measure  Vx(-\s,a,b)  on  the  Borel  space  &(M.n^))-, 
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•  Discrete  state  transition  kernel  Vq  :  Q  x  S  x  Ca  x  Cj,  —>  [ 0,1]:  a  Borel-measurable  discrete 
stochastic  kernel  on  Q  given  S  x  Ca  x  Cj  which  assigns  to  each  s  6  S  and  a  G  Ca,  b  <E  Ch  a 
probability  distribution  vq(-\s,a,b)  over  Q\ 

•  Reset  transition  kernel  v,- :  ^(Rn-'E)  xSxCaxCbXQ—>  [0, 1]:  a  Borel-measurable  stochas¬ 
tic  kernel  on  W1^  given  S  xCax  Q,  x  Q  which  assigns  to  each  s  G  S,  a  G  Ca,  b  G  C/,  and 
q'  EQ  a  probability  measure  Vr ( •  | .sy  a.  /?. <r//)  on  the  Borel  space  (E"!'//,..^(E'!U//))). 

In  contrast  with  the  single-player  case,  the  stochastic  transition  kernels  in  a  DTSHG  are  affected 
by  the  inputs  of  two  agents  with  possibly  differing  objectives.  In  particular,  we  assume  that  player 
I  and  player  II  are  non-cooperative  and  consider  a  conservative  decision  model  in  which  the  actions 
of  player  II  may  be  chosen  in  a  rational  fashion  based  upon  the  actions  of  player  I. 

Definition  4.2.  A  Markov  policy  for  player  I  is  a  sequence  /i  =  ...,/J.jv-i)  of  Borel  mea¬ 

surable  maps  /4  :  S  — >  Ca,  k  —  0, 1, N  —  1.  The  set  of  all  admissible  Markov  policies  for  player 
I  is  denoted  by  .J{a. 

Definition  4.3.  A  Markov  strategy  for  player  II  is  a  sequence  y  =  (yo,  yi, yv-i)  of  Borel  mea¬ 
surable  maps  %  :  S  x  Ca  — >  C/>,  k  =  0, 1, N  —  1.  The  set  of  all  admissible  Markov  strategies  for 
player  II  is  denoted  by  Th. 

The  scenario  described  here  is  a  common  setting  in  robust  control  problems  in  which  the  control 
selects  inputs  in  anticipation  of  the  worst-case  response  by  an  adversary  or  a  disturbance.  More 
formally,  this  can  be  interpreted  as  a  zero-sum  Stackelberg  game  in  which  player  I  is  the  leader. 
Due  to  the  asymmetry  in  information  in  a  Stackelberg  game,  equilibrium  strategies  of  a  zero-sum 
game  can  be  typically  chosen  to  be  deterministic  rather  than  randomized  (Breton  et  al.,  1988). 
We  note,  however,  that  in  a  zero-sum  stochastic  game  with  symmetric  information  (the  actions  of 
player  I  are  not  revealed  to  player  II),  the  existence  of  a  non-cooperative  equilibrium  in  general 
requires  randomized  strategies  (see  for  example  Shapley,  1953;  Maitra  and  Parthasarathy,  1970). 
This  case  is  discussed  in  section  4.5.  Furthermore,  if  one  were  to  consider  transition  probabilities 
and  utility  functions  which  depend  on  the  entire  history  of  the  game,  it  may  also  be  necessary  to 
broaden  the  class  of  player  strategies  to  encompass  non-Markov  policies  (Rieder,  1991;  Maitra 
and  Sudderth,  1998).  However,  as  shown  in  Rieder  (1991),  when  the  time  horizon  is  finite,  the 
transition  probabilities  are  Markovian,  and  the  utility  function  is  sum-multiplicative,  it  is  sufficient 
to  consider  the  class  of  Markov  control  policies.  The  consideration  of  infinite  horizon  problems, 
on  the  other  hand,  in  general  requires  semi-Markov  control  policies,  which  depend  on  the  initial 
condition.  This  case  is  discussed  in  section  4.6. 

For  a  given  initial  condition  5(0)  =  (</oAo)  E  S,  player  I  policy  /i  G  -J{a,  and  player  II  strategy 
y  G  Tb,  the  semantics  of  a  DTSHG  can  be  described  as  follows.  At  time  step  k,  each  player  obtains 
a  measurement  of  the  current  system  state  s(k)  —  (^(0),jc(0))  G  S.  Using  this  information,  player 
I  selects  a  control  input  a(k)  =  Hk(s(k)),  following  which  player  II  selects  a  disturbance  input 
b(k)  =  Yi{(s(k).a(k)).  The  discrete  state  is  then  updated  according  to  the  discrete  transition  kernel 
as  q{k+  1)  ~  vq(-\s(k),a(k),b(k)).  If  the  discrete  state  remains  the  same,  namely  q(k+  1)  =  q(k). 
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then  the  continuous  state  is  updated  according  to  the  continuous  state  transition  kernel  as  x(k+ 1)  ~ 
Vx(-\s(k)1a(k),b(k)).  On  the  other  hand,  if  there  is  a  discrete  jump,  the  continuous  state  is  instead 
updated  according  to  the  reset  transition  kernel  as  x(k  +  1)  ~  V,-(-| s(k),a(k),b(k),q(k+  1)). 

Following  this  description,  we  can  compose  the  transition  kernels  Vx,  Vq,  and  vy  to  form  a 
hybrid  state  transition  kernel  v  :  'dS(S)  x  S  x  Ca  x  Cb  — >•  [0, 1]  which  describes  the  evolution  of  the 
hybrid  state  under  the  influence  of  player  I  and  player  II  inputs: 


v((q',dx')\(q,x),a, 


vx(dx'\(q,x),a:b)vq(q\(q,x),a,b), 

Vr{dx,\(q,x),a,b,q,)vq(q'\(q,x),a,b), 


if  q'  =  q 
if  q’  ^  q. 


Using  the  transition  kernel  v,  we  can  now  give  a  formal  definition  for  the  executions  of  a  DTSHG. 


Definition  4.4.  Let  Ji?  be  a  DTSHG  and  W  6  N  be  a  finite  time  horizon.  For  a  given  /i  G  ^ a , 
y  G  Tb,  and  so  =  (goAo)  G  S,  a  stochastic  process  {s(k),  k  =  0,  with  values  in  S  is  an 
execution  of  M’  if  its  sample  paths  are  generated  according  to  Algorithm  4.2.1. 


Algorithm  4.2.1  DTSHG  Execution 

Require:  Initial  condition  so  =  (</o,-*o)  £  S,  player  I  policy  /i  G  player  II  strategy  y  G  F/;; 
Set  s(0)  =  s0; 
for  k  —  0  to  N  —  1  do 

Set  a(k)  =  Hk{s(k)y, 

Set  b(k)  =  Yk(s(k),a(k)); 

Extract  from  S  a  value  s^+i  for  s(k+  1)  according  to  v(-| s(k),a(k):b(k)); 

end  for 

return  Sample  Path  {s/c,k  =  0, ..., N}. 


By  this  definition,  the  execution  of  a  DTSHG  is  a  time  inhomogeneous  stochastic  process  on 
the  sample  space  Q.  =  SN+1,  endowed  with  the  canonical  product  topology  &(S). 

The  evolution  of  the  closed-loop  hybrid  state  trajectory  can  be  described  in  terms  of  the  transition 
kernels  v^,%(-|s)  :=  v(-|s,/4(s), yb(s,jU/c(s))),  k  =  0,...,N.  By  Proposition  7.28  of  Bertsekas  and 
Shreve  (1978),  for  a  given  initial  condition  s  G  S,  player  I  policy  /i  G  -Ma,  and  player  II  strategy 
y  G  Tb,  these  stochastic  kernels  induce  a  unique  probability  measure  Pf  '7  on  the  sample  space  G: 

PtJ(S0xSlx---xSN)=  [  I  ■■■  I  nv^HdSk+M5s(ds0),  (4.1) 

JSq  JS  i  JSn  £=o 

where  So,Si,...,S^  G  dd(S)  are  Borel  sets  and  8S  denotes  the  probability  measure  on  S  which 
assigns  unit  mass  to  the  point  sg5. 

4.2.1  Example  -  2-mode  Jump  Markov  System 

Consider  a  simple  jump  Markov  system  with  two  modes  of  operation  Q  =  {q\ ,  <72}-  The  transitions 
between  the  discrete  modes  are  modeled  probabilistically,  with  the  probability  of  dwelling  in  mode 
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qi  given  by  pt,  i—  1,2.  While  in  mode  qt,  a  continuous  state  x  G  M  evolves  according  to  a  stochastic 
difference  equation  x(k+  1)  =  fi(x(k),a(k),b(k)1w(k)),  defined  as  follows: 


fi(x(k),a(k),b(k),w(k )) 


2x(k)  +  a(k)  +b(k)  +  w(k),  i—  1 
jx(k)  +  a(k)  +  b(k)  +  w(k),  i  —  2 


(4.2) 


where  a  and  b  are  player  I  and  player  II  inputs,  and  w  is  a  random  variable.  It  is  assumed  that 
the  players  have  identical  capabilities,  with  a,b  G  [—1,1].  The  noise  is  modeled  by  a  uniform 
distribution  w  ~  W  [—  1 ,  + 1] . 

Under  the  DTSHG  modeling  framework,  the  hybrid  state  space  is  S  —  {q\. qi }  x  M,  and 
the  players’  input  spaces  are  Ca  —  Q  =  [—1, 1].  The  discrete  transition  kernel  vq  is  derived  as 
Vq(qj\(qi,x),a,b)  =  pi  if  qi  =  qj,  and  vq(qj\(qi,x),a,b)  =  1  —  pu  otherwise.  The  continuous  tran¬ 
sition  kernel  Vx  can  be  derived  from  the  continuous  state  dynamics  (4.2)  as 

Vx{dx\(qi,x),a,b)  ~  [2x  +  a  +  b  —  1 , 2x  +  a  +  b  +  1], 

Vx{dx  \  {q2:x) .a.b)  ~  ^[-x  +  a  +  b—  1,  -x  +  a  +  b  +  1], 

Finally,  the  reset  transition  kernel  is  given  by  vr(dx?\(q,x),a,b,q')  —  vx(dx'\(q,x).a.b). 


4.3  Problem  Formulation 


Within  the  context  of  a  DTSHG  model,  we  consider  stochastic  game  formulations  of  the  proba¬ 
bilistic  safety  and  reach-avoid  problems.  In  particular,  it  is  assumed  from  a  robust  control  stand¬ 
point  that  the  objective  of  player  I  is  to  maximize  the  probability  of  achieving  a  given  reachability 
specification,  while  the  objective  of  player  II  is  to  minimize  this  probability.  Thus,  the  safety  and 
reach-avoid  problems  for  a  DTSHG  become  zero-sum  stochastic  games.  Moreover,  due  to  the  fact 
that  player  II  is  allowed  to  select  inputs  in  response  to  the  actions  of  player  I,  they  fall  within  the 
class  of  zero-sum  Stackelberg  games  (Breton  et  al.,  1988;  Ba§ar  and  Olsder,  1999).  A  more  precise 
description  of  these  problems  is  given  below. 

First,  consider  the  probabilistic  safety  problem.  Assume  that  a  Borel  set  W  G  'dS(S)  is  given  as 
a  safe  set.  The  probability  that  the  sample  path  (sq,s\,...,sn)  remains  in  W  under  fixed  choices  of 
/i  G  .J^a  and  7  G  Tb  is  given  by 

p^(W)  :=  - ,sN ) :  sk  G  W,  Wk  G  [0 , A]})  -  P&y(WN+l),  (4.3) 


where  we  use  [0 ,  A]  as  a  shorthand  for  (0, 1, ..., N}. 

By  Proposition  7.45  of  Bertsekas  and  Shreve  (1978),  the  safety  probability  in  (4.3)  can  be 
computed  as 


p^(W)  -  lw(s0)  I  v^(dsk+i\sk)=E: 
JWN  Lo 


^ 0 


N 

n  iw(sk) , 

k=0 


(4.4) 
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where  Ef^  denotes  the  expectation  with  respect  to  the  probability  measure  P^'‘  on  the  sample 
space  Cl.  This  is  analogous  the  multiplicative  payoff  given  in  Abate  et  al.  (2008)  for  the  single¬ 
player  safety  problem. 

Now  consider  the  probabilistic  reach-avoid  problem.  Assume  that  Borel  sets  R ,  W'  £  SS[S)  are 
given  as  target  set  and  safe  set,  respectively,  with  R  C  W' .  The  probability  that  the  sample  path 
(^0 At)  •  ••,4/v)  reaches  R  while  staying  inside  W  under  fixed  choices  of  /1  £  and  y  £  Fh  is  given 
by 

^;r(R,Wf)  :=^{{{so,...,sN):BkeM,(skeR)A{sjeW',  Vj  e  [0,k])}) 

=  lf0’7(\J{W'\R)kxR]  =  £<r((W'\i?)fexi?),  (4.5) 

\k= 0  /  k=o 

where  the  last  equality  in  (4.5)  follows  by  the  fact  that  the  union  is  disjoint.  Again  by  Proposition 
7.45  of  Bertsekas  and  Shreve  (1978),  this  probability  can  be  computed  as 


r%r(R,W')  =  lR(so)+lW'\R(s0)  f  £  FI  ^k,7k(dsk+i\sk) 


JsN 


k=lj=l 


k= 0 


—  E so 


'  N  f  k-  I 

n  ( nw«i) ) 

k= 0  \j=0 


(4.6) 


where  denotes  the  expectation  with  respect  to  the  probability  measure  on  the  sample 
space  Cl.  This  is  analogous  to  the  sum-multiplicative  payoff  given  in  Summers  and  Lygeros  (2010) 
for  the  single-player  reach-avoid  problem. 

The  connection  between  the  safety  problem  and  reach-avoid  problem  is  established  by  the 
observation  that  the  hybrid  state  remains  inside  a  safe  set  W  for  all  k  —  0, 1, ..., N  if  and  only  if  it 
does  not  reach  the  unsafe  set  S\W  for  any  k  =  0. 1 .....  A.  Mathematically  speaking,  for  any  /i  e  -JPa 
and  yeTh, 

p^{W)  =  \-r^(S\W.S).  (4.7) 

Through  this  relation,  the  computation  of  the  safety  probability  can  be  viewed  as  a  special  case 
of  the  computation  of  the  reach-avoid  probability.  As  such,  we  will  now  focus  on  the  reach-avoid 
problem,  with  the  understanding  that  any  results  for  the  reach-avoid  problem  can  be  specialized  to 
the  safety  problem  via  equation  (4.7). 

In  a  zero-sum  Stackelberg,  or  equivalently  a  max-min,  formulation  of  the  probabilistic  reach- 
avoid  problem,  the  control  selects  a  choice  of  feedback  control  policy  /i  £  ./Ma  to  maximize  (4.6), 
in  anticipation  of  the  worst-case  response  by  the  adversary  in  the  selection  of  the  feedback  strategy 
y  £  T/,.  More  specifically,  we  define  the  worst-case  reach-avoid  probability  under  a  player  I  policy 

/l  £  .JPa  as 

r%{R.W')  =  inf  r?0y(R,W')i  s0  E  S.  (4.8) 

yer b 

The  Stackelberg  or  max-min  pay-off  for  player  I  is  then  given  by 

r*0(R,W')  :=  sup  r^(R.  W’),  sQ  £  S.  (4.9) 
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The  optimal  policy  of  player  I  and  the  optimal  strategy  of  player  II  are  interpreted  in  terms  of 
Stackelberg  equilibrium  strategies.  In  particular,  by  the  definitions  given  in  Breton  et  al.  (1988) 
and  Ba§ar  and  Olsder  (1999),  a  policy  jd*  G  .///a  is  a  Stackelberg  equilibrium  policy  for  player  I  if 
it  satisfies 

$(R,W')  =  r*Q(R,W'),  Vso  e  S.  (4.10) 

For  a  given  choice  of  equilibrium  policy  jd*  G  .///a  for  player  I,  a  strategy  y*  G  Th  is  a  Stackelberg 
equilibrium  strategy  for  player  II  if  it  satisfies 

r^f(R,W')  <  rf*’y(R,W'),  Vs0  G  S,y  G  Tb,  (4.11) 

Any  strategy  pair  (ju*,y*)  satisfying  (4.10)  and  (4.11)  is  referred  to  as  a  Stackelberg  solution  to 
(4.8)  and  (4.9).  Relating  these  notions  to  the  probabilistic  reachability  problem  of  interest,  we  will 
call  the  Stackelberg  payoff  the  max-min  reach-avoid probability,  an  equilibrium  policy  for  player  I 
a  max-min  control  policy,  and  an  equilibrium  strategy  for  player  II  a  worst-case  adversary  strategy. 
We  can  now  give  a  precise  statement  of  the  probabilistic  reach-avoid  problem  for  a  DTSHG. 

Problem  4.1.  Given  a  DTSHG  Jti?,  target  set  R  G  dd(S),  and  safe  set  W'  G  d§{S)  such  that  R  C  W'\ 

(I)  Compute  the  max-min  reach-avoid  probability  r*  {R,W'),  V.s'o  G  S; 

(II)  Find  a  max-min  control  policy  jd*  G  ./Ma,  whenever  it  exists; 

(III)  Find  a  worst-case  adversary  strategy  y*  G  F/;,  whenever  it  exists. 

4.4  Max-min  Probability  Computation 

In  this  section,  we  provide  a  detailed  proof  of  the  main  result  from  Kamgarpour  et  al.  (201 1),  as  the 
solution  to  Problem  4.1.  In  particular,  it  will  be  shown  that  under  certain  regularity  assumptions, 
the  max-min  probability  r*  (R.  \V')  can  be  computed  using  an  appropriate  dynamic  programming 
algorithm,  and  that  there  exists  a  max-min  Markov  policy  n*  G  -Mu  for  player  I  which  achieves 
this  probability  under  the  worst-case  player  II  strategy.  Following  the  proof,  we  will  discuss  some 
practical  implications  of  the  theorem  and  specialize  the  results  to  a  stochastic  game  formulation 
of  the  probabilistic  safety  problem.  Finally,  a  concrete  example  will  be  provided  to  illustrate  the 
procedure  for  computing  r*  (R,Wr),  as  well  as  the  max-min  policy  jl*  G  -Ma  for  player  I  and  the 
worst-case  strategy  y*  G  Tj,  for  player  II. 

4.4.1  Main  Theorem 

For  our  theoretical  derivations,  we  impose  the  following  regularity  assumptions. 

Assumption  4.1. 

(a)  For  each  s  —  (q,x)  G  S  and.Ei  G  d§(J$Ln^),  the  function  (a,b)  — >■  Vx{E\\s,a,b)  is  continuous; 
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(b)  For  each  s  =  (q,x)  E  S  and  q'  E  Q,  the  function  (a,b)  — y  vq(q'\s,a,b)  is  continuous; 

(c)  For  each  s  —  (q,x)  E  S,  q'  E  Q ,  and  E2  E  the  function  (a,b)  — >  Vr(E2\s,a,b,q')  is 

continuous. 


The  need  for  continuity  assumptions  on  the  stochastic  kernel  commonly  arise  in  the  stochastic 
game  literature  (see  for  example  Kumar  and  Shiau,  1981;  Maitra  and  Parthasarathy,  1970;  Nowak, 
1985;  Gonzalez-Trejo  et  al.,  2002),  due  to  the  difficulties  in  ensuring  the  measurability  of  value 
functions  under  max-min  dynamic  programming  operations.  Following  the  approach  in  Nowak 
(1985)  and  Rieder  (1991),  we  only  assume  continuity  of  the  stochastic  kernels  in  the  actions  of 
Player  I  and  Player  II,  but  not  necessarily  in  the  system  state.  This  allows  for  stochastic  hybrid 
systems  in  which  transition  probabilities  change  abruptly  with  changes  in  the  system  state.  Fur¬ 
thermore,  if  the  action  spaces  Ca  and  C/,  are  finite  or  countable,  then  the  assumptions  are  satisfied 
under  the  discrete  topology  on  Ca  and  Q.  Also,  the  assumptions  on  Vv  and  v,  are  satisfied  if  these 
kernels  admit  density  functions  that  are  continuous  in  the  player  inputs. 

For  the  construction  of  a  dynamic  programming  solution  to  Problem  4.1,  we  define  a  max- 
min  dynamic  programming  operator  ST  which  takes  as  its  argument  a  Borel  measurable  function 
J  :  X  — >  [0, 1]  and  produces  another  real- valued  function  on  A: 


J){s )  :=  lfl(s)  +  sup  inf  1  wt\R(s)H(s,a,b,J),  (4.12) 

aeC^Cb 

where //(5, a,  7)  =  J  J(s')v(ds'\s,a,b). 

We  now  proceed  to  show  that  the  max-min  reach-avoid  probability,  along  with  the  max-min 
control  policy  and  worst-case  disturbance  strategy,  can  be  derived  from  a  dynamic  programming 
algorithm  using  the  operator  ST . 

Theorem  4.1.  Let  MJ  be  a  DTSHG  satisfying  Assumption  4.L  Let  R.Wr  E  dS(S)  be  Borel  sets 
such  that  R  C  W' .  Let  the  operator  ST  be  defined  as  in  (4.12).  Then  the  composition  ■(7N  = 
■7  o  .7  o  ■  •  ■  o  f7  (N  times)  is  well-defined  and 

(a)  r*o(R,W')  =  ■9'N(lR)(so).yso  E  S; 

(b)  There  exists  a  player  I  policy  p*  E  -///a  and  a  player  II  strategy  y*  E  Ti  satisfying 


r^f(R,W')  <  r*Q(R,W')  =  r£’f  (R,W')  <  r^r(R:W), 


(4.13) 


V.S'o  E  S,  p  E  ^d  J  E  rh.  In  particular,  p*  is  a  max-min  control  policy,  and  y*  is  a 
worst-case  adversary  strategy. 

(c)  Let  =  1  r,  J £  =  t?N~k(l r),  k  =  0, 1, ...,  N  —  1.  If  p*  E  is  a  player  I  policy  which  satisfies 


Pk(s)  e  ar§  SUP  inf  H(s,a,b,Jl+l),  (4.14) 

aeca^cb 


81 


Vs  G  W'\R,  k  =  0. 1 .....  /V  —  1 ,  then  p*  is  a  max-min  control  policy.  Ify*  G  F/;  is  a  player  II 
strategy  which  satisfies 

fk{s,a)  G  arg  inf  H(s,a,b,Jl+1 ),  (4.15) 

becb 

Vs  EW'\R,  a  G  Cfl,  k  =  0, 1, . . . , N  —  1,  then  y*  is  a  worst-case  adversary  strategy. 


First,  we  will  present  a  recursive  procedure  for  computing  the  reach-avoid  probability 
r^QY(R,  W')  under  fixed  choices  of  player  I  policy  p  G  -Ma  and  player  II  strategy  y  G  Consider 
the  payoff  functions  :  S  — >  [0, 1],  k  =  0, . . .  ,N,  defined  as 


JnY(S  n)  =  1  k(sn), 


(4.16) 


VrM  =  E„ 


N 


7-1 


i^(^)+  52  (  n 

j=k+ 1  \i=k+\ 


,  k  =  0,  —  1. 


From  this  definition  we  can  infer  that  r^(R,W')  =  Jq'^(so),  Vsq  G  5.  Now  consider  a  recursion 
operator  parameterized  by  an  one-stage  player  I  policy  /  :  X  — >  Ca  and  an  one-stage  player 
II  strategy  g:SxCa->  Ch: 


$ },g(J)(s )  =  !/?(s)  +  1W'\R{s)H(s,f(s),g(s,f(s)),J),  s  G  5,  (4.17) 

where  //  is  defined  in  (4.12).  It  can  be  checked  in  a  straightforward  manner  that  the  operator 
satisfies  a  monotonicity  property:  for  any  Borel  measurable  functions  J,J'  from  S  to  [0, 1]  such  that 
J  <  f,  fi?f^(J)(s)  <  V/ 'fjg(J,)(s),Vs  G  S.  This  property  will  become  useful  later  on  in  the  dynamic 
programming  arguments. 

The  following  result  provides  a  recursive  algorithm  for  computing  the  functions  jf.'^ ■ 

Lemma  4.1.  Let  /i  G  -/Ma,  y  G  r^.  Then  the  payoff  functions  jff‘ ,  k  =  0, 1 .....  N  satisfies 

j£J(s)  =  ^k,yk(j£f\)(s),  Vs  G  5,  k  =  0, 1  1.  (4.18) 

Proof.  For  the  case  oik  —  N—\,  J^'Y  —  1r  implies  that  for  any  s  G  S, 


)VHN-UYN-l(dsN\s) 


1(5)  —  +lw'\/?(5)  JUM' 

=  ^Hn-vYn-i^n’7)- 

For  the  case  of  k  <  N  —  1,  the  expression  for  Jk'y  in  (4. 16)  implies  that  for  any  s  E  S, 

Jk  Y(s)  =lfi(5)  +  Iw'\r(S)  J  ix^k+l)  +  lw'\R(sk+l) 


Js 

N  7-1 


IV— 1 

n 

j=k+l 


.  E  n  ^R^lRisj)  ]  n  v^'Yi{dsj+i\sj)v^k(dsk+l\s) 

S  j=k+2i=k+2  J  j=lc+ 1 


-~lR(s)  +  lw/\R(s)  lsJ^l(sk+i)v^k(dsk+l\s). 
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It  follows  from  definition  of  2Tpg  that  the  last  expression  above  is  \),  thus  concluding 

the  proof.  □ 


Next,  we  will  show  that  under  Assumption  4.1,  the  operator  ST  defined  in  (4.12)  preserves  suit¬ 
able  measurability  properties  (thus  allowing  recursive  dynamic  programming  calculations)  and  that 
there  exists  one-stage  player  I  policy  and  player  II  strategy  achieving  the  supremum  and  infimum 
in  (4.12). 

In  the  following,  we  state  a  special  case  of  Corollary  1  given  in  Brown  and  Purves  (1973).  This 
result  allows  us  to  show  that  the  operator  2T  preserves  Borel  measurability  and  that  it  is  sufficient 
to  consider  Borel  measurable  selectors. 

Lemma  4.2.  Let  X,  Y  be  complete  separable  metric  spaces  such  that  Y  is  compact,  and  f  be  a 
real-valued  Borel  measurable  function  defined  on  X  xY  such  that  fix,  ■)  is  lower  semicontinuous 
with  respect  to  the  topology  on  Y.  Define  f*  :  X  — >  M  U  {±°°}  by 

fix)  =  inf \f(x,y). 


(a)  The  set 

I  ={x  EX  :  for  some  y  E  Y,f[x,y)  =  /*(*)}  , 

is  Borel  measurable. 

(b)  For  every  £  >  0,  there  exists  a  Borel  measurable  function  (j)  :  X  — *  Y,  satisfying,  for  all  x  EX, 


fM(x))=f(x),  if  x  El, 


fix  A  if)  < 


f*(x)  +  e,  ifx  f  I,  f*(x)  >  -°o, 
-1/e,  ifx  ^l,  f*(x)  =  — 


For  the  purpose  of  showing  that  the  supremum  and  infimum  in  the  expression  for  ST  is  achieved, 
we  will  also  need  the  following  technical  result. 

Lemma  4.3.  Let  f  be  a  bounded  real-valued  Borel  measurable  function  on  a  Borel  space  Y,  and 
t  be  a  Borel  measurable  transition  probability  from  a  Borel  space  X  into  Y  such  that  t(B |-)  is 
continuous  on  X  for  each  B  E  2d{Y).  Then  the  function  x  -E  f  f(y)l(dy\x)  is  continuous  on  X. 

This  was  stated  as  Fact  3.9  in  Nowak  (1985).  Since  neither  a  proof  nor  relevant  references  are 
provided  in  Nowak  (1985),  and  also  given  that  this  result  is  the  primary  use  for  Assumption  4.1,  a 
detailed  proof  is  given  in  appendix  A. 

We  now  prove  a  selection  result  for  the  max-min  operator  2Y .  For  notational  conveniences,  we 
denote  by  &  the  set  of  Borel  measurable  functions  from  S  to  [0, 1]. 


Proposition  4.1.  If  Assumption  4.1  holds,  then 
(a)  WJ  e  2F,  2T(J)  E  2F; 
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(b)  For  any  J  E  ■P ,  there  exists  a  Borel  measurable  function  g*  :  S  x  Ca  — >■  C/,  such  that,  for  all 
(s,a)  E  S  x  Ca, 


g*(s,a)  E  arg  inf  H(s,a,b,J)\ 
beCh 


(c)  For  any  J  E  there  exists  a  Borel  measurable  function  f*:S — >•  Cfl,  such  that  for  all  s  E  S, 

f*(s)  E  arg  sup  inf  Fl(s,a,b,J). 

aeca  heCb 


Proof  Let  /  E  & .  Define  a  function  Fj  :  S  x  Ca  x  Ck  — *  M  as  Fj(s,a,b)  —  Fl(s,a,b,J).  From 
the  definition  of  H,  the  range  of  Fj  is  contained  in  [0,1].  By  the  Borel  measurability  of  J  and  v, 
Proposition  7.29  of  Bertsekas  and  Shreve  (1978)  implies  that  Fj  is  Borel  measurable.  Furthermore, 
by  Assumption  4.1  and  Lemma  4.3,  Fj(s,a,b )  is  continuous  in  a  and  b ,  for  each  s  E  S.  Now 
consider  a  function  Fj(s,a )  =  inf/,eQ  Fj(s.a.b).  By  the  compactness  of  Q,  and  continuity  of  Fj 
in  b,  this  infimum  is  achieved  for  each  fixed  (s,  a)  (see  for  example  Rudin,  1976).  Thus,  applying 
Lemma  4.2,  we  have  that  there  exists  a  Borel  measurable  function  g*  :  S  x  Ca  — >  C/,  for  which  part 
(b)  holds.  Furthermore,  by  Proposition  7.32  of  Bertsekas  and  Shreve  (1978),  Fj  is  continuous  in  a. 
Let  Ff  (5)  =  supaeCfl  Fj(s,  a)  —  —  inf),ec„  —Fj(s,  a).  Then,  by  a  repeated  application  of  Lemma  4.2, 
there  exists  a  Borel  measurable  function  f*  :  S  -E  Ca  such  that  part  (c)  holds.  By  the  composition 
of  Borel  measurable  functions,  this  also  implies  that  FJ  is  Borel  measurable. 

Finally,  it  can  be  observed  that  bf(J)(s)  —  1^(5)  + 1  W'\r(s)FJ(s),  Vs  E  S.  Given  that  Borel 
measurability  is  preserved  under  summation  and  multiplication  (see  for  example  Folland,  1999, 
Proposition  2.6),  f?{J)  is  Borel  measurable.  It  is  also  clear  that  0  <  &{J)  <  1.  Part  (a)  then 
follows.  □ 

In  the  following  two  propositions,  we  show  by  a  dynamic  programming  argument  that  (\ r) 
both  upper  bounds  and  lower  bounds  the  max-min  reach-avoid  probability. 

Proposition  4.2. 

(a)  VsQ  E  S,  &n(1r){sq)  <  r*Q(R,W'); 

(b)  There  exists  /l*  E  -Mu  such  that,  for  any  7  E  F/„  (1r)(sq)  <  rj()  (R.  W'),Vs{)  E  S. 

Proof  For  notational  convenience,  we  define  J'k  3PN~k{\g),  k  —  0, 1 ..... /V.  First,  we  prove  the 
following  claim  by  backwards  induction  on  k :  there  exists  jlk  >N  —  (/l^.  /l(+] ,  ...,/ly_|)  E  -Ma  such 

that,  for  any  yk^N  =  {yk,  yk+h ...,  7v-i)  G  r*,  J*k  <  j^n^n _ 

Let  yk  ,,v  E  Tj,  be  arbitrary.  The  case  of  k  =  N  is  trivial.  Now  assume  that  this  holds  for  k  =  h. 
Let  hUn  e  .JTa  be  a  player  I  policy  satisfying  the  induction  hypothesis.  By  Proposition  4.1(c), 
there  exists  a  Borel  measurable  function  /*  :  X  — >  Ca  such  that 

f*(s)  E  arg  sup  inf  H(s,a,b,Jk),  Vs  E  S. 

aeca  heCb 
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Choose  a  policy  Hh_i->N  =  (/*,At/ Then  by  the  monotonicity  of  the  operator  and 
Lemma  4.1,  we  have  for  each  se5: 


4-1 


M  =  «)M 

=  is(s)+vv(j)ff(j,rw,»-i(s,/*w)vD 

>  l*w+  inf  1  w,XR(s)H(s,f(s),b,rh) 


=  ^(rh)(s)=rh_,(s). 


The  claim  then  follows  by  induction.  From  this,  we  obtain  [Iq^n  G  satisfying  (\r)(sq)  = 

Tq(to)  <  (so)  —  r^n-'N(R:  W'),  Wsq  G  S,  Yo^n  G  Tj,,  and  hence  satisfying  statement 

(b).  Furthermore,  since  Yo^N  is  arbitrary,  f?N{\R)  (so)  <  inty^ry,  ,N'y(R-  W'),  V.vo  G  S.  Statement 
(a)  then  follows.  □ 


Proposition  4.3. 

(a)  Vs0  G  S,  r%(R,W')  <  &n(1r)(s0); 

(b)  There  exists  y*  G  T/,  such  that,  for  any  fl  G  -//fh  r^  iR.  W')  <  (\R)(sf),  \/sq  G  S. 

Proof.  As  in  the  proof  of  Proposition  4.2,  we  define  :=  £TN~k{\R),  k  =  0, 1 ,  ...,2V.  First,  we 
prove  the  following  claim  by  backwards  induction  on  k:  there  exists  Jk  ,N  —  (yf.  yf\  .....yf  , )  G 

Td  such  that,  for  any  jik_+N  =  {pk,pk  +i,-,HN-i)  e  -^a,  Jk^N  <  J*. 

Let  Pk-^N  G  -Ma  be  arbitrary.  The  case  of  k  —  N  is  trivial.  Now  assume  that  this  holds  for  k  =  h. 
Let  Yh^n  G  F/;  be  a  player  II  strategy  satisfying  the  induction  hypothesis.  By  Proposition  4.1(b), 
there  exists  a  Borel  measurable  function  g*  :  S  x  Ca  — *  C/,  such  that 


g*(s,a)  G  arg  inf  H(s,a,b,Jh),  \/s  G  S,a  G  Ca. 
becb 


Choose  a  strategy  Yh_i^N  —  (g*-Yi,  >,v ) -  Then  we  have  for  each  s  G  S: 


Jh- 1 


=  l/?(-s)  +  Vy r(s)h  ( s ,  M/,- 1  (^) ,  g*  (5,  At/,- 1(5)),  Jh ) 

=  lt?0)  +  inf  1wi\r(s)H(s,  nh-\{s),b,Jl) 
becb  N 

<^(4)(5)=4_1(5). 


ll  ' y* 

The  claim  then  follows  by  induction.  From  this,  we  obtain  G  Td  satisfying  0  ’N(R.  W')  = 
Jq^n(s0)  <  Jq{sq)  —  (1r)(sq),  V^o  G  5,  At  G  and  hence  statement  (b).  This  in  turn 

implies  that  rf0(R,W')  —  infrer 'brf^(R,W')  <  (lR)(so),  V.vo  G  5.  At  G  proving  statement 

(a).  ’  □ 
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Combining  the  results  of  Proposition  4.2  and  4.2,  we  can  now  prove  Theorem  4.1. 

Proof.  Statement  (a)  of  Theorem  4.1  follows  directly  from  Proposition  4.2(a)  and  4.3(a).  The 
player  I  policy  fl*  and  player  II  strategy  y*  satisfying  statement  (b)  is  provided  by  Proposi¬ 
tion  4.2(b)  and  4.3(b),  respectively.  Finally,  it  can  be  inferred  from  the  proof  of  Proposition  4.2 
and  4.3  that  any  player  I  policy  /l*  and  player  II  strategy  y*  satisfying  the  conditions  in  statement 
(c)  is  a  max-min  policy  or  worst-case  strategy,  respectively.  □ 

4.4.2  Implications  of  the  Main  Theorem 

1)  Probabilistic  reachability  computation:  By  statement  (a)  of  Theorem  4.1,  the  max-min  reach- 
avoid  probability  can  be  computed  using  a  sup-inf  dynamic  programming  procedure.  This  can 
be  viewed  as  the  counterpart  to  the  HJI  equation  for  discrete  time  stochastic  system.  Namely, 
instead  of  solving  a  terminal  value  PDE  backwards  in  time,  the  probabilistic  reachability  computa¬ 
tion  for  a  DTSHG  involves  solving  an  integro-difference  equation  backwards  in  time,  as  described 
by  the  dynamic  programming  operator  ST .  As  in  the  case  of  the  HJI  equation,  the  solution  to  the 
integro-difference  equation  in  general  does  not  have  a  closed-form  expression,  and  as  such  requires 
numerical  approximation.  However,  whereas  the  approximation  of  the  PDE  solution  involves  ap¬ 
proximation  of  the  spatial  derivatives  and  an  integral  in  time,  the  approximation  of  the  solution  to 
the  integro-difference  equation  requires  approximation  of  spatial  integrals.  To  illustrate  this,  sup¬ 
pose  that  we  have  computed  the  optimal  cost-to-go  function  7^+1  :=  r)  at  time  k+  1, 

then  the  optimal  cost-to-go  function  J't  at  time  k  is  computed  as 

J*k(s)  =  ^(Jk+i)(s)  =  1fl(s)+  sup  inf  lw,\R(s)H(s,a,b,Jl+l),  Vs  G  S.  (4.19) 

aeCa  b^cb 

For  a  fixed  s  =  ( q,x )  G  S,  and  inputs  a  €  Ca,  b  e  Q>,  the  explicit  form  of  H(s,a,b,Jl+l )  in  the 
above  expression  is  given  by 

H((q,x),a,b,j£+l )  =  jjt+1(J)v(dJ\(q,x),a,b)  (4.20) 

=Vq(q\(q,x),a,b)  /  4+l(q,x)vx(dx\(q,x),a,b)  + 

J  R»(?) 


Thus,  the  approximation  of  2F  is  tantamount  to  the  approximation  of  the  integral  in  (4.20)  for 
discretized  values  of  the  hybrid  state  and  player  inputs.  Assuming  that  the  stochastic  kernels  vq 
and  v,.  are  described  in  terms  of  probability  density  functions,  this  approximation  can  be  performed 
using  Riemann  or  Lebesgue  integrals.  For  the  single  player  case,  a  numerical  scheme  of  such  type 
is  proposed  in  Abate  et  al.  (2007)  for  the  computation  of  the  safety  probability.  In  particular,  it 
is  shown  that  piecewise  constant  approximations  of  the  value  function  on  a  grid  of  the  continuous 
state  space  converge  uniformly  to  the  optimal  value  function,  at  a  rate  that  is  linear  in  the  grid  size 
parameter.  We  anticipate  that  a  similar  result  can  be  shown  for  the  case  of  a  DTSHG.  However, 
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it  can  be  observed  that  the  computational  cost  of  such  an  approach  scales  exponentially  with  the 
dimension  of  the  continuous  state  space,  which  currently  limits  the  application  of  our  approach  to 
problems  with  continuous  state  dimensions  of  n  <  4.  The  reduction  in  computation  time  is  a  topic 
of  ongoing  research  (Esmaeil  Zadeh  Soudjani  and  Abate,  2011). 

2)  Controller  synthesis:  Equations  (4.14)  and  (4.15)  provide  us  with  sufficient  conditions 
for  optimality  of  the  players’  policies  and  strategies.  In  particular,  this  can  be  used  to  synthesize 
a  max-min  control  policy  for  player  I  from  the  value  functions  computed  through  the  dynamic 
programming  recursion.  To  illustrate,  suppose  that  the  input  ranges  Ca  and  C/;  along  with  the 
state  space  S  has  been  appropriately  discretized  into  Cd,  Cd,  and  Sd .  Then  we  can  numerically 
approximate  the  optimal  cost-to-go  functions  J£,  k  =  0, 1, ..., N,  with  the  functions  Jd  computed  on 
Sd,  for  example  according  to  the  method  suggested  in  Abate  et  al.  (2007).  At  the  k- th  iteration  of 
this  dynamic  programming  procedure,  we  can  store  the  optimal  control  inputs 

p{[sd)  e  arg  sup  inf  Hd  (sd  ,a,b,Jd+l),  sd  e  (W'\R)  HSd, 
aecibec* 

where  Hd  is  an  appropriate  discrete  approximation  of  the  operator  H.  This  provides  us  with  a  dis¬ 
crete  representation  for  an  approximate  max-min  control  policy  p*  —  {p^.p\, In  particu¬ 
lar,  at  time  step  k,  p£  (sd)  represent  the  optimal  input  selection  over  the  grid  volume  corresponding 
to  the  grid  node  sd .  For  the  single-player  case,  it  has  also  been  shown  in  Abate  et  al.  (2007)  that 
the  approximate  control  policy  synthesized  in  such  a  manner  provides  a  performance  level  that 
converges  to  the  optimal,  as  the  size  of  each  grid  volume  is  reduced. 

3)  Robustness  and  optimality:  By  statement  (b)  of  Theorem  4. 1,  if  the  control  were  to  choose 
the  max-min  policy  p*  and  the  adversary  were  to  deviate  from  the  worst-case  strategy  y*,  then 
the  reach-avoid  probability  will  be  at  least  r*  (R,Wr).  On  the  other  hand,  if  the  control  were  to 
deviate  from  the  max-min  policy  and  the  adversary  were  to  choose  the  worst-case  strategy,  then 
the  reach-avoid  probability  will  be  at  most  r*  (R.  W').  Thus,  p*  can  be  interpreted  as  a  robust 
control  policy  in  the  sense  that  by  choosing  p*,  the  reach-avoid  probability  will  be  no  less  than 
r*  (R,Wr),  regardless  of  any  variations  in  adversary  strategy  within  the  class  F/;.  It  can  be  also 
interpreted  as  an  optimal  policy  in  the  sense  that  it  optimizes  a  worst-case  performance  index, 
namely  the  worst-case  reach-avoid  probability  with  respect  to  fixed  choices  of  control  policies 
within  the  class  .Ma. 

4)  Probabilistic  reach-avoid  set:  Consider  the  case  in  which  the  design  specifications  requires 
the  controller  to  guarantee  a  reach-avoid  probability  of  at  least  (1  —  e),  for  some  small  e  e  [0, 1). 
The  set  of  initial  conditions  S£  for  which  this  specification  is  feasible  can  be  derived  from  the 
max-min  reach-avoid  probability  as: 

S§  =  {s0eS:  I%(R,W')  >(l-£)}. 

In  other  words,  Sf}  is  the  (1  —  e)-superlevel  set  of  the  function  ,vo  — »  r*  (R,W').  This  set  can  be 
then  used  to  provide  guidance  on  where  the  system  state  should  be  initialized.  In  particular,  if 
the  control  were  to  select  inputs  according  to  the  max-min  policy,  then  the  system  state  should  be 
initialize  inside  Sq  in  order  to  ensure  that  the  desired  specifications  will  be  met. 
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4.4.3  Specialization  to  Stochastic  Game  Formulation  of  Safety  Problem 

As  discussed  in  section  4.3,  the  solution  to  the  probabilistic  safety  problem  can  be  obtained  from 
a  complementary  reach-avoid  problem.  In  particular,  for  a  given  safe  set  W  G  FS(S),  consider  a 
reach-avoid  problem  with  the  value  function 

r*  (S\W,S)  :=  inf  sup  r?’r(S\W,Sf  s0  G  S. 

H€^ayerb 

Then  the  max-min  probability  of  safety  is  given  by 

P*S0(W)  =  SUP  irVf  ^W)  -  l-f*S0(S\W,S),  so  G  5.  (4.21) 

ne^a^r» 


By  minor  modifications  of  the  proof  for  Theorem  4.1,  it  can  be  shown  that  r*Q(S\  W.S)  is 
computed  by  the  dynamic  programming  recursion 

f*0(S\W,S)  =  ^(1  s\w)(so),  so  G  S, 

where  the  operator  is  defined  as 

—  inf  sup  ls\w(s) +  lw(s)H(s, a, b,J),  s  e  S.  (4.22) 

aeC«  becb 

Combining  this  with  (4.21),  we  then  arrive  at  the  following  result  for  the  computation  of  the  max- 
min  safety  probability. 

Theorem  4.2.  Let  M'  be  a  DTSHG  satisfying  Assumption  4.1.  Let  W  G  3§(S)  be  a  Borel  safe  set. 
Then 

rf„W  =  l-^(lsw)(so),  Vs„6S. 

For  completeness,  we  note  that  there  exists  an  equivalent  dynamic  programming  recursion  to 
compute  the  max-min  safety  probability,  analogous  to  the  one  given  in  Abate  et  al.  (2008)  for  the 
single  player  case.  Specifically,  consider  an  operator  t%/  defined  as 

£7w{J)(s)  =  sup  inf  1  w(s)H(s}a,b,J),  s  G  S.  (4.23) 

aeca  beCb 

The  relation  between  and  £LW  is  established  through  the  following  lemma. 

Lemma  4.4.  For  every  s  G  S  and  k  =  0, 1, ..., N, 

^w(lw)(s)  =  1  ~  ^w(ls\w)(5)- 
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Proof.  We  prove  this  result  by  induction  on  k.  The  case  of  k  =  0  is  established  by  the  fact  that 
1]Y  =  1  —  ls\w-  Now  suppose  the  identity  holds  for  k  =  h.  then  we  have  for  every  s  e  S. 

!?i+\h v)M  =  M^w(iw))(s)  =  ■%■(!-  X(l^w))(s) 

=  sup  inf  lw(s)H(s,a,b,  1  -  5j(l«y)) 

aecatecb 

=  sup  inf  lw(s)(l-H(s,a,b,&$(l^w))) 

ceCab£Cb 

=  lw(s)  +  sup  inf  -1  w(s)H(s,a,b,  ^#(1 s\w)). 
aecab^cb 

It  then  follows  that  for  every  s  e  S 

1  -  ^f+1(l w)0)  =  1  -%(s)  -  sup  inf  -lw(x)H(s,a,b,3$(ls\w)) 

aecabecb 

=  1s\iv(s)  +  in|  sup  lw(x)H(s,a,b,  3$(ls\w)) 

aeC«  beCb 

=  ^tv(^w(ls\w))(5)  =  ^w+\ls\w)is), 

which  completes  the  proof.  □ 

Thus,  an  equivalent  dynamic  programming  recursion  for  computing  the  max-min  safety  prob¬ 
ability  is  given  by 

P*So(W)  =  ^(lw)(s0),  s0eS.  (4.24) 

Using  either  the  operator  .5%  or  the  operator  .)% ,  we  can  also  derive  sufficient  conditions  of 
optimality  for  player  I  and  II,  analogous  to  those  given  in  (4.14)  and  (4.15). 

4.4.4  Analytic  Reach-Avoid  Example 

We  illustrate  the  sequence  of  steps  associated  with  a  probabilistic  reachability  calculation  in  the 
context  of  the  jump  Markov  system  example  in  section  4.2.1.  In  particular,  consider  a  regulation 
problem  in  which  the  objective  of  player  I  is  to  drive  the  continuous  state  into  a  neighborhood  of 
the  origin,  while  staying  within  some  safe  operating  region.  In  this  case,  the  target  set  and  safe  set 
are  chosen  to  be  R  =  {71,72}  x  [—  and  W'  —  {<71 ,  <72}  x  [—2,2].  In  the  following,  we  will  solve 
for  the  max-min  reach-avoid  probability  and  player  I  policy  over  a  single  stage  of  the  stochastic 
game  (. N  =  1). 

Given  the  DTSHG  model,  the  operator  H(s,a,b,J)  for  a  hybrid  state  5  =  (q\,x)  can  be  derived 
as  follows: 

H((qi,x),a,b,J)  =  J  J(s')v(ds'\(qi,x),a,b)  (4.25) 

=p\  I  J(qi,2x  +  a  +  d  +  w)dw  + 

(l-.Pi)  f  J(q2,2x-\-a  +  d  -\-w)dw. 
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For  an  initial  condition  .vq  =  {q\  ,xq),  the  max-min  reach-avoid  probability  can  be  then  computed 
as 


* 

l  suPaeCfl  mfbeCb  H  ( (quxo) ,  a,  b,  1/f) , 


l*o|  <  J, 

1*0 1  >  2, 

5  <  |*o|  <  2. 


(4.26) 


From  equations  (4.25)  and  (4.26),  the  analytic  expression  for  the  max-min  reach-avoid  probability 
in  mode  <71  is: 


(R,w')  = 


(h 

1 


§-|*o|, 

0, 


l*o|  <  \ 

1  <  l*o|  <  \ 

2  l-^ol  —  8 

l*o|  >  §■ 


In  the  process  of  performing  the  dynamic  programming  step  in  (4.26),  we  also  obtain  a  max- 
min  player  I  policy  jUq  and  a  worst-case  player  II  strategy  in  mode  ql  satisfying  the  sufficient 
conditions  for  optimality  in  (4.14)  and  (4.15): 


— sgn(.ro),  |*o  >  \ 

-2*o,  |*o |  <  3, 


^((7i,*0),a) 


—  1 ,  2x()  +  a  <  0 

1,  2xo  +  a>0. 


Using  a  similar  procedure,  we  can  compute  the  max-min  reach-avoid  probability  for  an  initial 
condition  so  =  (72, *0)  as 


( 1,  l*o|  <  l 

r*0(R,W')  =  &(lR)(q2,xo)  =  U,  \  <  |*0 1  <  2 

[0,  |*o  |  >  2. 


Furthermore,  a  max-min  player  I  policy  and  a  worst-case  player  II  strategy  satisfying  the  sufficient 
conditions  for  optimality  in  mode  qi  can  be  derived  as  follows: 


Mo  (<?2,*o)  = 


sgn(.ro),  |*o  |  >2 
|*o  |  <  2, 


tv 

2*0 


^((«2,*0),fl)  = 


1, 


1, 


^.vo  +  a  <  0 
\xq  +  a>  0. 


As  one  consider  more  complicated  system  models,  there  may  no  longer  be  a  closed-form  ex¬ 
pression  for  the  operator  2F .  This  would  then  require  a  numerical  approximation  of  the  dynamic 
programming  procedure,  as  discussed  previously  in  section  4.4.2. 
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4.5  Alternative  Information  Patterns 


In  the  discussions  so  far,  we  have  considered  Stackelberg  formulations  of  the  probabilistic  safety 
and  reach-avoid  problems,  with  an  asymmetric  information  pattern  which  gives  an  advantage  to 
Player  II.  As  noted  previously,  this  selection  of  information  pattern  is  based  upon  a  conservative 
assumption,  commonly  made  within  the  context  of  robust  control,  that  the  intent  of  Player  I  might 
be  available  to  Player  II,  and  that  player  II  might  use  this  information  to  his/her  advantage.  While 
it  is  often  the  case  that  disturbances  or  adversaries  found  in  practical  applications  will  not  be  able 
to  observe  the  actual  inputs  selected  by  the  control,  a  control  policy  constructed  under  such  an 
assumption  is  nonetheless  robust  to  the  worst-case  behavior  of  the  adversary.  We  will  refer  to 
problem  formulations  of  such  type  as  Scenario  I. 

The  focus  of  this  section  is  to  explore  several  alternative  information  patterns  that  are  less 
conservative  in  the  assumption  on  player  input  selections.  Reachability  problems  formulated  under 
such  settings  may  be  of  interest  in  application  scenarios  beyond  those  traditionally  considered  in 
robust  control.  In  particular,  they  correspond  to  competitive  or  adversarial  scenarios  in  which 
the  control  has  equal  or  better  access  to  information  as  compared  with  the  adversary.  The  main 
results  of  this  section  are  as  follows.  Under  an  asymmetric  information  pattern  favoring  player 
I,  the  Stackelberg  solutions  to  the  probabilistic  safety  and  reach-avoid  problems  can  be  computed 
using  a  slight  modification  of  the  dynamic  programming  procedure  from  Scenario  I.  On  the  other 
hand,  under  a  symmetric  information  pattern,  the  existence  of  Nash  equilibria  for  the  safety  and 
reach-avoid  problems  in  general  requires  randomization  in  the  selection  of  player  I  and  player  II 
controls. 

4.5.1  Stackelberg  Formulation  Favoring  Player  I 

First,  we  consider  an  information  pattern  in  which  the  control  is  allowed  to  select  controls  in 
response  to  the  actions  of  the  adversary  at  each  stage  of  the  dynamic  process.  Such  a  situation 
could  for  example  arise  in  a  patrol  and  surveillance  application  in  which  the  actions  of  an  intruder 
is  captured  by  a  surveillance  system.  Reachability  problems  formulated  under  this  information 
pattern  can  be  then  interpreted  as  zero-sum  Stackelberg  games  giving  an  advantage  to  player  I 
in  the  optimization  of  the  safety  or  reach-avoid  probability.  We  refer  to  this  type  of  problem 
formulation  as  Scenario  II. 

To  give  a  more  precise  definition  for  the  reachability  problems  in  Scenario  II,  it  is  necessary  to 
introduce  the  class  of  Markov  strategies  for  Player  I  and  the  class  of  Markov  policies  for  Player  II. 

Definition  4.5.  A  Markov  strategy  Ja  for  Player  I  is  a  sequence  ya  =  (Y‘)-Y\-  ••••  Yn  i  )  of  universally 
measurable  maps  ^:SxC(,->Cfl,l  =  0,l,...,Af^l.  The  set  of  such  strategies  is  denoted  by  Ta. 

Definition  4.6.  A  Markov  policy  Pb  for  Player  II  is  a  sequence  pi,  —  (jUg,  jltf ,  jltf? ...)  of  universally 
measurable  maps  p%  :  S  — >  C t,,  k  =  0, 1, ..., N  —  1.  The  set  of  such  policies  is  denoted  by  Mb- 
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We  briefly  note  that  Markov  policies  are  a  subclass  of  Markov  strategies,  namely  they  consist 
of  the  set  of  Markov  strategies  which  do  not  explicitly  depend  on  the  input  of  the  other  player. 
More  specifically,  ./Ma  C  and  .Mb  C  F/,. 

Using  a  similar  construction  as  in  section  4.2,  we  can  define  for  a  given  Markov  strategy  ya  e  ra 
and  a  given  Markov  policy  jib  G  Mb  a  stochastic  kernel  describing  the  closed-loop  hybrid  state 
evolution  at  time  step  k : 

v^(-k)  :=  v(-| sk,yg{sk,n%{sk)),nj!(sk)). 

As  before,  this  induces  a  probability  measure,  denoted  by  PJq^\  on  the  sample  space  G.  Note 
that  if  both  players  select  Markov  policies  rather  than  Markov  strategies,  namely  jda  €  Ma  and 
jdb  G  .-#/;.  then  the  probability  measures  in  Scenario  I  and  II  are  equivalent:  P^a'^b  =  P^‘^h. 

Through  the  connection  between  the  safety  and  reach-avoid  problems  as  discussed  in  section 
4.3,  we  will  focus  on  the  formulation  of  the  probabilistic  reach-avoid  problem.  Under  Scenario  II, 
the  payoff  function  for  the  reach-avoid  problem  becomes 


pYa-Mb 

'So 


(RX) 


"  N 

fk- 1  \ 

77 'YaiHb 

El 

11  ^W'\R{sj)  1  K(Sk) 

k= 0 

\j= o  J 

(4.27) 


where  Pj"'^b  denotes  the  expectation  with  respect  to  the  probability  measure  PJX’  on  the  sample 
space  G. 

Under  a  zero-sum  Stackelberg  formulation  of  the  reach-avoid  problem,  the  worst-case  reach- 
avoid  probability  in  Scenario  II  under  a  player  I  strategy  Yb  G  U,  is  defined  as 


'SQ 


(RX) 


inf 


~Ya,Hb 

'so 


(R,W%  es. 


(4.28) 


The  Stackelberg  or  max-min  pay-off  for  player  I  in  Scenario  II  is  then  given  by 

■=  sup  r%(R,W'),  so  G  S.  (4.29) 


The  optimal  strategy  of  player  I  and  the  optimal  policy  of  player  II  are  interpreted  in  terms  of  the 
Stackelberg  solutions  to  (4.28)  and  (4.29),  in  an  analogous  fashion  as  described  for  the  problem 
in  Scenario  I.  The  Stackelberg  payoff  in  this  case  will  be  referred  to  as  the  max-min  reach-avoid 
probability  for  Scenario  II,  while  a  Stackelberg  solution  (')£,ju£)  will  be  referred  to  as  a  max-min 
control  strategy  and  a  worst-case  adversary  policy,  respectively. 

A  precise  statement  of  the  probabilistic  reach-avoid  problem  in  Scenario  II  is  as  follows. 

Problem  4.2.  Given  a  DTSHG  Jff,  target  set  R  €  &(S),  and  safe  set  W’  G  3§{S)  such  that  R  C  W'\ 

(I)  Compute  the  max-min  reach-avoid  probability  for  Scenario  II:  r*(R.  W'),  V.vq  G  S; 

(II)  Find  a  max-min  control  strategy  e  Ya,  whenever  it  exists; 

(III)  Find  a  worst-case  adversary  policy  ll*h  G  -Mb,  whenever  it  exists. 
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Within  a  game  theoretic  context,  this  problem  formulation  can  be  interpreted  in  two  different 
ways.  From  the  point  of  view  of  a  static  optimization  problem,  namely  the  selection  of  a  player  I 
strategy  ya  G  Ta  and  the  selection  of  player  II  policy  /!/,  G  with  respect  to  the  payoff  function 
rJ“'^h(R,W'),  then  it  is  a  static  Stackelberg  game  with  player  I  as  the  leader.  On  the  other  hand, 
from  the  point  of  view  of  a  multi-stage  dynamic  game,  the  information  structure  in  each  stage  of 
the  dynamic  game  involves  player  I  selecting  inputs  in  response  to  the  actions  of  player  II.  Thus, 
it  can  be  also  interpreted  as  a  sequential  or  feedback  Stackelberg  game  with  player  II  as  the  leader. 
For  further  details,  the  interested  reader  is  referred  to  the  discussions  in  Breton  et  al.  (1988)  and 
Ba§ar  and  Olsder  (1999). 

The  problem  formulations  in  Scenario  I  and  Scenario  II,  as  interpreted  in  terms  of  feedback 
Stackelberg  games,  differ  only  in  the  order  of  play  in  each  stage  of  the  dynamic  game.  In  particular, 
player  I  goes  first  in  Scenario  I,  but  second  in  Scenario  II.  Thus,  a  dynamic  programming  solution 
to  Problem  4.2  can  be  constructed  in  much  the  same  way  as  described  in  section  4.4,  except  for  an 
exchange  in  the  order  of  optimization  in  each  step  of  the  dynamic  programming  procedure.  More 
precisely,  consider  a  dynamic  programming  operator  3F  operating  on  Borel  measurable  functions 
from  S  to  [0,1]: 

fiL(J)(s)  —  inf  sup  lfl(s)  +  lwi\R(s)H(s,a,b,J),s  G  S.  (4.30) 

b£CbaeCa 

In  the  following,  we  state  a  dynamic  programming  result  for  Scenario  II.  The  proof  is  analogous 
to  the  one  given  in  section  4.4.1  for  Scenario  I  and  is  hence  omitted. 


Theorem  4.3.  Let  MJ  be  a  DTSHG  satisfying  Assumption  4.1.  Let  R.  Wf  G  4S(S)  be  Borel  sets 
such  that  R  C  W' .  Let  the  operator  2F  be  defined  as  in  (4.30).  Then  the  composition  — 
FT  o;fo---o  /T  (N  times)  is  well-defined  and 


(a)  ~rl(R1W')^^N(lR)(sQ)ys0eS; 

(b)  There  exists  a  player  I  strategy  Y)  *=  ra  and  a  player  II  policy  G  -Mh  satisfying 


pTa^t 

'So 


(- R,w')<K0(R,w ') 


(R,W')  <rf^b(R:W'), 


(4.31) 


V.vo  G  S,  ya  G  ra,  and  /!/,  G  In  particular,  Ya  is  a  max-min  control  strategy,  and  is  a 
worst-case  adversary  policy. 

(c)  Let  Jf  =  1  J'k  ~  £FN^k(l r),  k  =  0. 1,  ...,N  —  1.  IfYa  £  is  a  player  I  strategy  which  satisfies 

Yk*(s,b)  G  arg  sup  H(s,a,b,j£+l),  (4.32) 

aCiCa 


V.v  G  W'  \  R.  b  G  Q,,  k  =  0, 1, . . . ,  N  —  1,  then  Y)  is  a  max-min  control  strategy.  If  G  -///i,  is 
a  player  II  policy  which  satisfies 

Hk*(s)e  arg  inf  sup  H(s,a,b,J%+l),  (4.33) 

teCbaeCa 


Ws  G  VFr  \R,  k  =  0, 1, . . .  ,N  —  1,  then  fl £  is  a  worst-case  adversary  policy. 
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The  corresponding  result  for  the  safety  problem  can  be  derived  in  a  similar  manner  as  described 
in  section  4.4.3  for  Scenario  I. 

Given  the  differing  interpretations  of  the  problem  formulation  in  Scenario  II,  there  are  corre¬ 
spondingly  two  different  interpretations  of  the  Stackelberg  solution.  In  particular,  from  the  point 
of  view  of  a  static  optimization  problem,  the  max-min  control  strategy  is  a  selection  of  control 
strategy  which  optimizes  the  worst-case  probability  of  achieving  the  reach-avoid  specification  over 
the  strategy  class  This  is  analogous  to  the  robustness  and  optimality  properties  of  the  max-min 
control  policy  in  Scenario  I,  as  discussed  in  section  4.4.2.  On  the  other  hand,  from  the  perspective 
of  a  multi-stage  dynamic  game,  one  can  infer  from  (4.32)  that  each  component  y?'*  of  the  max-min 
control  strategy  is  a  best  response  function  with  respect  to  selections  of  inputs  by  player  II  in 
each  stage  of  the  dynamic  game. 

It  is  intuitive  that  with  an  asymmetry  in  the  information  pattern  favoring  player  II,  the  max-min 
reach-avoid  probability  in  Scenario  II  should  be  equal  to  or  higher  than  the  max-min  reach-avoid 
probability  in  Scenario  I.  This  is  confirmed  by  the  form  of  the  dynamic  programming  recursion  in 
statement  (a)  of  Theorem  4.3.  In  particular,  due  to  the  exchange  in  the  order  of  optimization,  we 
have 

which  implies  that  r*  (R,W')  <  r*  V.vo  G  S.  Moreover,  as  will  become  apparent  in  the 

discussions  of  the  following  subsection,  this  inequality  is  in  general  strict. 

4.5.2  Nash  Formulation 

In  many  practical  application  scenarios,  it  is  often  reasonable  to  assume  a  symmetric  information 
pattern  in  which  both  Player  I  and  Player  II  make  decisions  based  only  upon  the  state  of  the  system 
at  each  time  step.  This  is  in  fact  the  typical  assumption  in  many  competitive  economic  models. 
More  generally,  it  is  applicable  within  the  context  of  a  control  problem  in  which  the  control  and 
the  adversary  can  be  modeled  as  acting  simultaneously,  and  hence  unaware  of  each  other’s  intent 
(de  Alfaro  et  al.,  2007).  This  could  be  for  example  an  aircraft  conflict  resolution  problem  in  which 
the  aircraft  involved  only  broadcast  their  position  and  heading  information,  but  not  their  future 
intent.  We  will  refer  to  reachability  problem  formulations  with  this  type  of  information  pattern  as 
Scenario  III. 

While  a  symmetric  information  pattern  may  be  attractive  from  a  modeling  standpoint,  the  ex¬ 
istence  of  equilibrium  player  strategies,  in  the  sense  of  a  Nash  or  saddle  point  equilibrium  (Nash, 
1951),  typically  requires  the  consideration  of  randomized  strategies.  Computationally,  such  strate¬ 
gies  are  often  significantly  more  difficult  to  synthesize  as  compared  with  non-randomized  strate¬ 
gies,  due  to  their  large  representation  size.  Moreover,  the  practical  implementation  of  such  strate¬ 
gies  can  be  questionable  in  certain  applications.  For  example,  within  the  context  of  air  traffic 
management,  it  is  of  interest  to  system  operators  to  devise  conflict  resolution  strategies  which 
result  in  predictable  aircraft  behaviors. 

As  will  be  shown  in  this  subsection,  when  we  only  consider  non-randomized  Markov  policies 
in  Scenario  III,  there  does  not  exist  in  general  a  Nash  equilibrium  solution  to  the  probabilistic  safety 
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or  reach-avoid  problem.  However,  for  such  cases,  the  Stackelberg  payoff  as  computed  in  Scenario  I 
correspond  to  the  lower  value  of  the  symmetric  dynamic  game,  and  hence  a  conservative  estimate 
of  the  payoff  for  player  I.  On  the  other  hand,  if  randomized  Markov  policies  are  considered,  a 
Nash  equilibrium  is  shown  to  exist,  under  the  same  set  of  assumptions  on  the  DTSHG  model 
as  in  Scenario  I  and  II.  In  such  cases,  however,  the  problem  becomes  one  of  computation  and 
implementation  of  randomized  policies,  as  discussed  above. 

In  order  to  define  the  problem  formally,  we  will  first  specify  the  information  structure  in  Sce¬ 
nario  III.  As  consistent  with  the  assumption  of  symmetric  access  to  information,  Player  I  is  con¬ 
strained  to  choose  Markov  policies  jJ.a  within  the  class  .///a,  while  Player  II  is  constrained  to  choose 
Markov  policies  /!/,  within  the  class  .////,.  By  the  discussions  of  the  preceding  subsection,  under 
fixed  choices  of  policies  pa  G  -/Ma  and  /i/;  G  -Mi,,  the  probability  measures  Pf^b  and  are 

well-defined  and  equivalent,  for  every  ,vo  G  S.  It  then  follows  that  under  a  fixed  pair  of  policies 
(to, lib),  we  have  W1)  =  r^b(R,W'),  Vs0  e  5. 

As  consistent  with  the  typical  analysis  procedure  of  symmetric  zero-sum  games  (see  for  exam¬ 
ple  Ba§ar  and  Olsder,  1999),  we  now  define  the  lower  value  and  upper  value  of  the  reach-avoid 
problem  in  Scenario  III. 

Definition  4.7.  The  lower  value  of  the  probabilistic  reach-avoid  problem  in  Scenario  III  is  defined 
as 

rl  (R, W'):=  sup  inf  W'),  s0  G  S.  (4.34) 

Definition  4.8.  The  upper  value  of  the  probabilistic  reach-avoid  problem  in  Scenario  III  is  defined 
as 

rt(R,W'):=  inf  sup  r“(i?,lT'),  s0eS.  (4.35) 

jiae^a 

The  lower  value  corresponds  to  the  case  in  which  Player  I  declares  his/her  policy  to  Player  II, 
while  the  upper  value  corresponds  to  the  case  in  which  Player  II  declares  his/her  policy  to  Player 
I.  It  can  be  checked  that  the  following  inequality  always  holds: 

rlS0(R,W')<rl(Ry\  sq  G  S, 

which  agrees  with  the  intuition  that  the  player  who  declares  his/her  policy  first  is  at  a  disadvantage. 
Clearly,  in  a  symmetric  dynamic  game,  neither  player  would  reveal  his/her  policy  to  the  other  ahead 
of  time.  Thus,  equation  (4.34)  should  be  interpreted  as  a  conservative  calculation  of  the  payoff 
from  the  point  of  view  of  Player  I,  while  equation  (4.35)  should  be  interpreted  as  a  conservative 
calculation  of  the  cost  from  the  point  of  view  of  Player  II. 

In  the  case  that  the  upper  and  lower  values  are  equal,  then  it  may  be  possible  to  construct  Nash 
equilibrium  strategies  for  both  players.  Specifically,  consider  the  case  in  which 

rls()(R-W')  =  rl‘o(R.W').  \/sq  G  S.  (4.36) 

Suppose  for  now  that  the  outer  supremum  in  (4.34)  is  achieved  by  some  Markov  policy  p*  G  -Ma 
and  that  the  outer  infimum  in  (4.35)  is  achieved  by  some  Markov  policy  p£  G  b .  Then  it  can  be 
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checked  that  r^4‘h 
have  that 


(R,w ') 


=  rlS{}(R,W')  =  rl‘{)(R.\V,):is{)  G  S,  and  for  any  pa  G 
(R,  W’)  <  rS^(R,W’)  <  r^h(R,W'),  VsQ  G  S. 


.///a,  Pb  €  -^b  we 


(4.37) 


Thus,  p*  can  be  interpreted  as  an  optimal  policy  for  Player  I  in  the  sense  that  if  Player  II  chooses 

p£,  then  the  payoff  for  Player  I  can  be  no  greater  than  '  b  (R.  VP').  In  a  similar  manner,  p£  can  be 
interpreted  as  an  optimal  policy  for  Player  II.  Using  terminology  from  noncooperative  game  theory 
(Ba§ar  and  Olsder,  1999),  for  any  R.  W'  such  that  (4.36)  holds,  we  say  that  the  probabilistic  reach- 
avoid  problem  in  Scenario  III  has  value ,  and  any  pair  (ju*,ju£)  which  satisfies  (4.37)  is  referred  to 
as  a  saddle  point  or  Nash  equilibrium  solution  to  the  reach-avoid  problem. 

With  these  preliminaries,  a  precise  statement  of  the  probabilistic  reach-avoid  problem  in  Sce¬ 
nario  III  can  be  given  as  follows. 

Problem  4.3.  Given  a  DTSHG  M:,  target  set  R  G  &(S),  and  safe  set  W'  G  P3{S)  such  that  R  C  W'\ 
(I)  Compute  r[Q(R,  W')  and  r%0(R,  W'),  Vso  G  S; 

(II)  If  the  probabilistic  reach-avoid  problem  has  a  value,  find  a  saddle  point  solution  (p*:  p£). 

As  it  turns  out,  there  is  a  correspondence  relation  between  the  upper  and  lower  value  of  Sce¬ 
nario  III  and  the  Stackelberg  payoffs  in  Scenario  I  and  II.  This  is  established  through  the  following 
proposition. 

Proposition  4.4.  Let  be  a  DTSHG  satisfying  Assumption  4.1.  Let  R.  W'  G  3§{S)  be  Borel  sets 
such  that  R  C  W'.  Then  under  the  information  pattern  of  Scenario  111,  the  following  identities  hold: 

(a)  rlSa(R,W')  =  r*o(R,Wr),  \/sq  G  S. 

(b)  i*0(R,W')  =  r*0(R,W'),  \/s0  G  S. 

Proof.  We  will  prove  the  result  for  the  lower  value  (part  (a)  above),  the  proof  for  the  upper  value 
is  analogous.  First,  for  each  pa  G  .Jfa,  we  have  by  the  definition  of  the  worst-case  reach-avoid 
probability  in  Scenario  I  that 


inf  r^'rh{R,W')  <  inf  r^M(R,W'),  Vs0  G  S. 

YbCPb  Ub^-^b 


Then  it  follows  by  the  definition  of  the  lower  value  in  Scenario  III  that 

Vj„€S. 

Second,  by  Proposition  4.3(b),  there  exists  a  Player  II  strategy  G  Tb  such  that  for  any  Player 
I  policy  pa  G  y//a,  we  have 

pfh(R.  W')  <  ^N(lR)(s0),  Ws0  G  S. 
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For  each  pa  G  .Ma,  consider  a  choice  of  Player  II  policy  pb  G  ,/f/i,  defined  by 


ftV)  =  M),  s  6  k  =  0, 1,...,AT-  1. 

Then  it  follows  that 

(.|5);  ys  es,  k  =  0,  1,...,A-  1, 

From  this  we  can  deduce  that  for  each  pa  G  ./#a,  there  exists  pb  G  such  that,  for  any  sp  G  5, 
the  following  inequality  holds 

4fb(R.Wr)  =  r^fb(R,W')  <  & N(1r)(sq )■ 

Then  by  the  result  of  Theorem  4. 1,  we  have,  for  each  pa  G  .Jta  and  sq  G  S, 

inf  r?Z’n(R,W')<r;0(R,W'), 

UbE^b 

and  hence  rj  (R,  IT')  <  r*  (/?,  IT'),  V.vo  G  5,  which  completes  the  proof.  □ 

In  other  words,  the  Stackelberg  payoff  of  Scenario  I  can  be  interpreted  as  the  lower  value  of 
Scenario  III,  while  the  Stackelberg  payoff  of  Scenario  II  can  be  interpreted  as  the  upper  value  of 
Scenario  III.  A  sufficient  condition  for  the  existence  of  value  and  saddle  point  solution  in  Scenario 
III  can  be  then  given  in  terms  of  the  dynamic  programming  operators  in  Scenario  I  and  II. 

Proposition  4.5.  Let  -Vf  be  a  DTSHG  satisfying  Assumption  4.1.  Let  R,W'  G  3§(S)  be  Borel  sets 
such  that  R  C  IT'.  If  the  operator  H,  as  defined  in  (4.12),  satisfies 

sup  inf  H(s,a,b,J)  =  inf  sup  H(s, a,  b,J)  (4.38) 

aeCa  beCb  beCb  aeC„ 

for  every  s  G  S  and  J  G  then 

(a)  The  probabilistic  reach-avoid  problem  in  Scenario  III  has  a  value; 

(b)  There  exists  a  player  1  policy  p*  G  -///a  and  a  player  II  policy  fi£  G  ■////,  such  that  (pi,  pi) 
forms  a  saddle-point  solution  to  the  reach-avoid  problem; 

(c)  Let  Jf  =  1  a>,  J;'k  —  £LN~k(\R),  k  =  0. 1,  ...,N  —  1.  If  p*  G  y//a  is  a  player  I  policy  which  satisfies 

Pk*(s)  G  arg  sup  inf  H(s,a,b,Jl+1),  (4.39) 

aeC„  b^cb 

V.v  G  VT'  \  R.  k  =  0.  1 .....  iV  —  1 ,  and  if  pi  G  -Mb  is  a  player  II  policy  which  satisfies 

Pk’*(s)  G  arg  inf  sup  H(s,a,b,J%+1),  (4.40) 

b^cbaeCa 

V.v  G  IT'  \  R.  k  =  0.  1 .....  A  —  1 ,  then  (pi,  pf)  is  a  saddle-point  solution. 
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Proof.  Suppose  (4.38)  holds,  then  we  have  P? (J)  =  PT (J),  V7  G  This  in  turn  implies  that 

r^W')  =  ^N(U)(so)  =  ^N(1r)(so)  =  %0(R,W'),  Vso  e  S. 

Combining  this  with  Proposition  4.4,  statement  (a)  follows. 

In  such  a  case,  we  have  by  Theorem  4.1(b)  that  there  exists  a  max- min  control  policy  jl*  G  -Jfa 
for  Scenario  I  satisfying 


r*S0(R,  W’)  <  r$'Yb(R,W'),  Vv0  G  5,  Yb  G  Tb. 

Moreover,  we  have  by  Theorem  4.3(b)  that  there  exists  a  worst-case  adversary  policy  jlf  G  -Mb  for 
Scenario  II  satisfying 

cffR-W’)  <  r*0(R,Wr),  Vvo  G  S,  Ya  e  ra. 

With  another  application  of  Proposition  4.4,  the  pair  (ju*,/ i£)  can  be  shown  to  be  a  saddle  point 
solution  to  the  reach-avoid  problem. 

Finally,  if  ji*  G  is  a  player  I  policy  satisfying  (4.39),  then  jl*  is  a  max-min  control  policy 
satisfying  the  conditions  of  Theorem  4.1(b).  Moreover,  if  G  ■////,  is  a  player  II  policy  satisfy¬ 
ing  (4.40),  then  jlf  is  a  worst-case  adversary  policy  satisfying  the  conditions  of  Theorem  4.3(b). 
Statement  (c)  then  follows.  □ 

Equations  of  the  form  (4.38)  are  often  referred  to  in  literature  as  a  minimax  condition.  Efforts  to 
establish  conditions  for  when  such  equations  hold  can  be  traced  back  to  von  Neumann’s  minimax 
theorem  (von  Neumann  and  Morgenstern,  1944),  which  has  since  been  generalized  by  several 
authors  (see  for  example  Fan,  1953;  Sion,  1958).  The  following  set  of  conditions  are  due  to  Fan 
(1953): 

Assumption  4.2. 

•  Cb  is  a  compact  Hausdorff  space; 

•  For  every  s  G  S,  a  G  Ca,  and  J  G  & ,  the  function  b  — >■  H(s,a,b,J)  is  lower  semicontinuous 
and  convexlike,  namely  for  any  b  \ .  hi  G  C/,  and  A  G  [0, 1],  there  exists  b{)  G  C/,  such  that 
H(s,a,bo,J)  <  fH(s,a,bi,J )  +  (1  —  X)H(s,a,b2,J). 

•  For  every  seS.be  Cb,  and  J  G  the  function  a  -e  H(s.  a ,  b.  J )  is  concavelike,  namely  for 
any  a\,a2  G  Ca  and  A  G  [0, 1],  there  exists  ao  G  Ca  such  that  H(s,ao,b,J)  >  XH(s,a\,b,J)  + 

(1-A  )H(s,a2,b,J). 

FTnder  Assumption  4.2,  we  have  by  Theorem  2  of  Fan  (1953)  that  (4.38)  holds,  and  hence 
a  saddle  point  solution  exists.  Intuitively  speaking,  this  assumption  requires  that  the  function 
(a,  b )  —>H(s,  a,  b,J)  be  “saddle-like”  for  each  fixed  s  and  J.  If  one  were  to  restrict  one’s  attention  to 
non-randomized  control  policies,  as  in  the  discussions  so  far,  this  condition  can  be  rather  restrictive. 
In  particular,  it  is  well-known  that  there  exists  finite  state  games  in  which  these  conditions  do  not 
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hold  and  a  pure  strategy  equilibrium  does  not  exist.  One  such  example,  as  adapted  from  de  Alfaro 
et  al.  (2007),  is  given  below. 

Consider  a  two-state  system  with  the  state  space  S  =  and  action  spaces  Ca  =  {1,2}, 

Cb  —  {1,2}.  The  transitions  between  the  discrete  states  are  described  in  terms  of  a  discrete  transi¬ 
tion  relation  S  :  S  x  Ca  x  Q,  — >  S  defined  as  follows:  5  {q\ ,  a,  b)  =  q\ ,  if  a  ^  b,  and  S  (q\ ,  a,  b)  —  q2, 
otherwise;  5(q2,a,b)  —  c/2,  \/a,b.  This  is  illustrated  in  Figure  4.1.  The  corresponding  transition 
kernel  for  the  DTSHG  model  can  be  derived  in  a  straightforward  manner  from  this  diagram. 


a^b 


a  -b 


a  =  1,2 
*  =  1,2 


Figure  4.1:  Two-state  example  to  illustrate  equilibrium  strategies  in  symmetric  dynamic  games. 


Now  suppose  that  q\  is  a  safe  state  and  <50  is  an  unsafe  state,  so  that  W  =  {q\}.  In  the  case 
that  input  selections  are  deterministic,  it  is  intuitive  that  if  player  II  were  allowed  to  observe  the 
inputs  of  player  I,  as  in  Scenario  I,  then  player  II  can  always  select  a  choice  of  input  to  drive  the 
system  state  into  qi  in  one  step.  On  the  other  hand,  if  player  I  is  allowed  to  observe  the  inputs  of 
player  II,  as  in  Scenario  II,  then  player  I  can  always  select  a  choice  of  input  at  each  time  step  to 
keep  the  system  state  in  q\.  In  particular,  one  can  verify  that  if  only  non-randomized  strategies  are 
considered,  the  safety  probability  in  Scenario  I  is  zero,  while  the  safety  probability  in  Scenario  II 
is  one,  over  any  time  horizon  N  >  1 . 

On  the  other  hand,  if  the  players  are  allowed  to  randomize  their  selection  of  inputs,  namely 
player  I  is  allowed  to  select  a  —  1  with  probability  pa  and  player  II  is  allowed  to  select  b  —  1  with 
probability  pb,  then  one  can  view  this  as  a  dynamic  game  in  which  the  player  inputs  are  pa  G  [0, 1] 
and  pb  G  [0, 1].  The  operator  H  with  respect  to  these  randomized  strategies  then  takes  on  the 
following  form: 


H(q,pa,Pb,J) 


J(qi )  (Pa  +  Pb  -  2paPb)  +  J(qi)  ('  -  Pa- Pb  +  ^PaPb);  <7  =  <?1 
J(qi),  q  =  qi- 


Clearly,  H  is  concave  in  pa  and  convex  in  pb,  and  hence  satisfies  the  minimax  condition  (4.38). 
In  particular,  over  one  stage  of  the  dynamic  game,  the  value  is  0.5  in  state  q\,  and  the  equilibrium 
strategies  are  given  by  pa  —  pb  —  0.5. 

In  this  simple  example,  it  can  be  observed  that  randomized  strategies  induce  a  natural  convexity 
structure  in  the  dynamic  programming  operator.  This  is  one  of  the  primary  reasons  that  Nash  equi¬ 
libria  often  exists  only  in  mixed  or  randomized  strategies,  rather  than  in  pure  or  non-randomized 
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strategies.  In  the  following,  we  will  proceed  to  show  that  this  observation  in  fact  holds  for  the 
probabilistic  reachability  problems  under  consideration. 

Specifically,  we  associate  with  each  DTSHG  model  —  ( Q ,  n,Ca,Cb,  Vx,  V9,  V, )  a  randomized 
input  DTSHG  model  —  (Q,n,C'a,C'b,  v',  v',  v'),  defined  as  follows: 

.  C'  =  ^(Ca),C'=^(C*); 

•  For  every  5  G  5,  Pa  G  «^(Ca),  and  Pb  e  ^{Cb), 

v'x(-\s,Pa,Pb)  :=  f  j  v'(-| s,a,b)Pa(da)Pb(db); 

J  C\j  J  Ca 

•  For  every  5  G  5,  Pu  G  «^(Cfl),  and  Pb  G  &{Cb), 

v'(-| s,Pa,Pb):=  f  f  v'(-| s,a,b)Pa(da)Pb(db); 

J  Cb  J  Ca 

•  For  every  5  G  5,  Pa  G  «^(Ca),  Pb  G  ^(C/,),  and  <7'  G  2, 

v,/(-|.y,Pfl,P/„<7/)  :=  /  /  vy (•  | J, b, q)Pa{da)Pb{db) . 

J  Cb  J  Ca 

In  the  above,  ^{Ca)  (resp.  &(Cb))  denote  the  space  of  probability  measures  on  C«  (resp.  C/,j. 
Since  and  C/,  are  compact  Borel  spaces,  P?{Ca)  and  3P(Cb)  are  also  compact  Borel  spaces  (see 
for  example  Bertsekas  and  Shreve,  1978,  section  7.4).  Furthermore,  if  a  DTSHG  model  satisfies 
Assumption  4.1,  then  one  can  verify  using  Proposition  7.21  of  Bertsekas  and  Shreve  (1978)  that 
the  associated  randomized  input  model  also  satisfies  Assumption  4.1. 

The  probabilistic  safety  and  reach-avoid  problems  defined  in  terms  of  the  model  allows 
player  policies  which  select  inputs  from  randomized  input  spaces.  In  particular,  player  I  (resp. 
player  II)  is  allowed  to  select  policies  from  the  Markov  policy  space  .M'a  (resp.  .J{'b)  defined  in 
terms  of  the  input  space  ^(Ca)  (resp.  PP(Cb)). 

For  a  given  DTSHG  model  we  say  that  the  probabilistic  reach-avoid  problem  in  Scenario 
III  has  a  value  with  respect  to  randomized  policies  if  the  reach-avoid  problem  defined  in  terms  of 
has  a  value.  Moreover,  we  call  a  saddle  point  solution  (ju',ju()  to  the  reach-avoid  problem 
defined  in  terms  of  a  randomized  saddle  point  solution.  In  the  following,  we  will  show  that 
under  the  same  set  of  assumptions  as  in  Scenario  I  and  II,  the  reach-avoid  problem  in  Scenario  III 
has  a  randomized  saddle  point  solution. 

Proposition  4.6.  Let  be  a  DTSHG  satisfying  Assumption  4.1.  Let  K.  W'  G  3§(S)  be  Borel  sets 
such  that  R  C  W1 .  Then 

(a)  The  probabilistic  reach-avoid  problem  in  Scenario  Ill  has  a  value  with  respect  to  randomized 
policies; 

(b)  There  exists  a  randomized  saddle  point  solution  to  the  reach-avoid  problem. 
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Proof.  Let  M”  be  the  randomized  input  DTSHG  model  associated  with  M’ .  Then  for  a  given 
.v  G  S,  Pa  G  C' ,  Pb  G  C'h,  and  J  G  JF,  the  operator  H  can  be  written  as 

H(s,Pa,Pb,J)  =  [  [  [ J(s')v(ds'\s,a,b)Pa(da)Pb{db). 

J  Cjy  J  CQ  J  S 

It  can  be  verified  that  H  defined  as  above  satisfies  Assumption  4.2.  Specifically,  given  that  C’b  = 
PP(Cb)  is  a  compact  Borel  space,  the  first  condition  is  satisfied.  Furthermore,  since  M"  satisfies 
Assumption  4.1,  the  function  (Pa.  Pb)  — >  H(s.Pa.Pb.J)  is  continuous  by  the  proof  of  Proposition 
4.1.  Finally,  for  any  Pa,Pa  €  C'a  and  A  G  [0, 1],  we  have  Pa  :=  APa  +  (1  -  X)Pa  G  C'a  and 

fH(S,Pa,Pb,J)  +  (l-f)H(S,Pa,Pb,J)=H(S,Pa,Pb,J). 

This  implies  that  H  is  concavelike  on  C'a.  In  a  similar  manner,  one  can  show  that  H  is  convexlike 
on  C'b.  Thus,  the  conditions  of  Assumption  4.2  are  verified. 

By  Theorem  2  of  Fan  (1953),  we  have  that  the  minimax  condition  (4.38)  holds.  The  statements 
of  the  proposition  then  follows  by  an  application  of  Proposition  4.5.  □ 

4.6  Infinite  Horizon  Properties 

In  this  section,  we  will  consider  infinite  horizon  formulations  of  probabilistic  reachability  prob¬ 
lems.  This  type  of  formulation  is  particularly  relevant  within  the  context  of  safety  problems,  in 
which  case  the  specifications  for  many  practical  applications  are  to  enforce  the  safety  property 
for  all  time  (i.e.  an  invariance  specification),  rather  than  over  some  given  finite  time  horizon. 
Moreover,  the  investigation  of  infinite  horizon  problems  opens  the  possibility  for  stationary  con¬ 
trol  policies.  As  compared  with  the  time-varying  policies  generated  by  finite  horizon  reachability 
computations,  stationary  policies  in  general  have  a  smaller  representation  size,  and  are  easier  to 
implement  in  practice. 

From  a  theoretical  perspective,  there  are  several  issues  at  hand  when  considering  infinite  hori¬ 
zon  optimal  control  or  dynamic  game  problems: 

•  The  mathematical  characterization  of  the  infinite  horizon  payoff  as  the  solution  to  an  appro¬ 
priate  fixed-point  equation; 

•  The  convergence  of  the  finite  horizon  payoffs  to  the  infinite  horizon  payoff; 

•  The  existence  of  optimal  infinite  horizon  contol  policies,  stationary  or  otherwise. 

In  the  case  that  the  convergence  property  above  holds,  one  can  approximate  the  infinite  horizon 
payoff  through  a  finite  horizon  computation.  However,  as  discussed  in  chapter  5  of  Bertsekas 
and  Shreve  (1978),  such  a  property  is  not  always  assured,  and  simple  counterexamples  can  be 
constructed  in  the  deterministic  case.  Moreover,  the  existence  of  optimal  policies  in  infinite  horizon 
zero-sum  games  can  be  a  rather  subtle  and  non-trivial  issue  when  the  payoff  is  not  discounted  (see 
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for  example  Kumar  and  Shiau,  1981;  Nowak,  1985),  as  in  the  case  of  probabilistic  safety  and 
reach-avoid  problems. 

We  will  address  these  questions  within  the  context  of  the  probabilistic  reach-avoid  problem, 
with  the  understanding  that  results  for  the  safety  problem  can  be  specialized  from  the  particular 
case  of  the  reach-avoid  problem  in  which  the  objective  is  to  minimize  the  probability  of  reaching 
a  target  or  unsafe  set.  It  will  be  shown  that  the  infinite  horizon  reach-avoid  probability  is  a  fixed 
point  of  the  dynamic  programming  operator  2?  defined  in  section  4.4.  Furthermore,  this  infinite 
horizon  payoff  can  be  approximated  by  the  finite  horizon  dynamic  programming  procedure.  In  the 
case  that  the  objective  of  the  control  is  to  maximize  the  reach-avoid  probability,  it  is  also  shown 
that  there  exists  a  stationary  worst-case  adversary  strategy.  However,  as  consistent  with  results 
in  literature  (Kumar  and  Shiau,  1981;  Nowak,  1985),  the  corresponding  result  for  the  control  is 
comparatively  weaker.  In  particular,  it  is  shown  that  there  exists  a  time-varying  £-optimal  semi- 
Markov  control  policy.  In  the  reverse  case  that  the  objective  of  the  control  is  to  minimize  the  reach- 
avoid  probability,  as  in  the  case  of  the  safety  problem,  the  results  are  correspondingly  reversed. 
Namely,  in  such  a  case,  there  exists  a  stationary  max-min  control  policy. 

For  a  precise  statement  of  the  infinite  horizon  reach-avoid  problem,  let  /i  =  . . . )  G 

,/Ma  be  an  infinite  horizon  Markov  policy  for  player  I  and  let  y  =  (y0,  yi, . . . )  G  F /,  be  an  infinite 
horizon  Markov  strategy  for  player  II.  Then  by  Proposition  7.28  of  Bertsekas  and  Shreve  (1978), 
the  stochastic  kernels  T^,%,  k  =  0,1,...  induce  a  unique  probability  measure  P^'y  on  the  sample 
space  Cl  =  Uk=0S.  Under  a  given  /i  G  .JPa  and  y  G  F/;.  the  infinite  horizon  reach-avoid  probability 
is  defined  as 

<’7(tf,  w')  :=  <’r({(50,si, ...)  :  3k  >  0,  (sk  eR)  A  (sj  G  W',  V;  G  [0, *])}).  (4.41) 


The  above  expression  can  be  equivalently  written  as 


/ 


,JG7 

r«o 


(R,W’)  :=  <r  (  U(W'YR)*xtf  =  '£lf0’r((W'\R)kxR) 

\k= 0 


k= 0 


N 

lim  V  E. ™ 

N^°°k=o 


—  lim  r\ 

N^-OO 


.Ho^nJo^n 

so 


fk- 1 

niwv(xy)  )  1  R{*k) 

VJ=0 


(4.42) 


where  /io-wv  =  (/i«-  •  •  • ,  /J/v  i )  and  Yo^n  =  (7o,  •  •  • ,  7v- 1 )  denote  the  player  I  policy  and  player  II 
strategy,  respectively,  over  time  horizon  [0 ,N] .  In  other  words,  under  a  fixed  infinite  horizon  policy 
/r  and  a  fixed  infinite  horizon  strategy  y,  the  infinite  horizon  reach-avoid  probability  is  the  limit  of 
the  finite  horizon  reach-avoid  probability  as  N  — » 

As  in  section  4.3,  we  define  the  infinite  horizon  worst-case  reach-avoid  probability  under  an 
infinite  horizon  player  I  policy  /i  G  -/Ma  as 

r%(R.W')  =  inf  r^y(R,W'),  s0  G  5.  (4.43) 
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The  infinite  horizon  max-min  pay-off  for  player  I  is  then  given  by 

r70(R,W)  :=  sup  rf0(R,W'),  s0  G  S.  (4.44) 

Llf-./Za 

The  max-min  control  policy  and  the  worst-case  adversary  strategy  are  then  interpreted  as  the  Stack- 
elberg  solution  to  (4.43)  and  (4.44).  Given  that  an  optimal  policy  for  player  I  may  not  exist  in  the 
infinite  horizon  case  (Kumar  and  Shiau,  1981),  we  will  widen  the  notion  of  optimality  to  e-optimal 
policies.  In  particular,  a  control  policy  £ia  G  ,/Ma  is  said  to  be  an  £-optimal  max-min  control  policy 
if 

r$(R,W')>r^R,W')-e,VSoeS. 

Clearly,  a  max-min  control  policy  is  optimal  if  it  is  0-optimal.  The  definition  for  the  worst-case 
adversary  strategy  remains  the  same  as  in  the  finite  horizon  case. 

The  infinite  horizon  reach-avoid  problem  for  a  DTSHG  is  stated  as  follows. 

Problem  4.4.  Given  a  DTSHG  Jff,  target  set  R  G  SS{S),  and  safe  set  W'  G  M{S)  such  that  R  C  W': 

(I)  Compute  the  infinite  horizon  max-min  reach-avoid  probability  W'),  V.vq  G  5; 

(II)  For  a  choice  of  £  >  0  such  that  an  £-optimal  max-min  control  policy  fla  G  -^a  exists,  find 
such  a  policy. 

(Ill)  Find  a  worst-case  adversary  strategy  y*  G  F/;,  whenever  it  exists. 

In  the  following,  it  will  be  shown  that  the  infinite  horizon  max-min  reach-avoid  probability  is 
in  fact  a  fixed-point  of  the  dynamic  programming  operator  2F ,  and  that  it  can  be  approximated  by 
the  finite  horizon  reachability  computation  as  described  in  section  4.4. 1 .  In  particular,  defining  the 
function  V*  :  S  — >  [0, 1]  as  V*(sq)  :=  W'),  so  G  S,  we  will  show  that 

V*  =  (4.45) 

Moreover,  defining  the  finite  horizon  max-min  reach-avoid  probability  over  [0,  TV]  as 

r#(R,W'):=  sup  inf  r^N^N  (R,W), 


we  will  show  that 


rZ(R,W')  =  lim  rZ0(R,W'):  Vs0  G  S.  (4.46) 

U  N^-oo  u 

By  (4.42),  and  the  definitions  of  r70(R,W')  and  r^(R,  W'),  this  is  equivalent  to  showing 
sup  inf  lim  r^N,^N (R ,W/)  =  lim  sup  inf  r7'  "v'%  ’N(R.Wr). 

In  other  words,  the  limit  can  be  exchanged  with  the  supremum  and  infimum. 

We  begin  by  proving  that  the  limit  in  (4.46)  in  fact  exists. 
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Lemma  4.5.  For  each  so  G  S,  the  sequence  {r^(R,  V7/)}^_1  converges. 

Proof.  For  each  N  >  is  the  finite  horizon  max- min  reach-avoid  probability  over  [0.  TV] . 

Thus,  for  every  so  €  S  and  N  >  1,  rfQ(R,W')  G  [0, 1]. 

By  Theorem  4. 1,  we  have  that  for  each  N  >1  and  sq  G  S,  ( R ,  W1)  =  FFN  (1 R)  (^o).  From  the 
definition  of  SF  in  equation  (4.12),  it  is  clear  that  1  r  <  SF(\  r).  Furthermore,  by  the  properties 
of  integrals,  it  follows  directly  that  the  operator  FF  satisfies  a  monotonicity  property:  if  J.J'  G  & 
are  value  functions  such  that  J  <  J',  then  FF[J)  <  FF  (J').  Given  these  properties  of  FF ,  it  can  be 
verified  that  FFk(l r)  <  FFk+l(l r)  for  every  k>  0. 

From  this,  we  conclude  that  for  each  sq  G  5,  the  sequence  {r^(R, is  bounded  and 
monotonically  increasing,  and  hence  converges  (see  for  example  Rudin,  1976,  Theorem  3.14).  □ 

For  notational  conveniences,  we  define  a  function  14 :  X  — >  [0, 1]  as 

Voo(so)  =  lim  W'),  Ws0  G  5.  (4.47) 

AM.OO  u 

By  Proposition  4.1,  it  follows  that  V4  is  the  limit  of  a  sequence  of  Borel-measurable  functions,  and 
hence  is  also  Borel-measurable  (see  for  example  Folland,  1999,  Proposition  2.7).  The  following 
result  shows  that  V4  is  a  fixed  point  of  the  operator  FF . 

Proposition  4.7.  Let  V4  he  defined  as  in  (4.47)  and  FF  be  defined  as  in  (4.12).  Then  V4  satisfies 
the  fixed-point  equation 

Voo  =  FF(Va o). 

The  proof  is  somewhat  technical  and  can  be  found  in  appendix  B.  The  line  of  argument  is 
adapted  from  that  found  in  Kumar  and  Shiau  (1981)  for  additive  cost  problems.  In  the  following, 
we  use  this  result  to  prove  (4.46). 

Proposition  4.8.  Let  14,  be  defined  as  in  (4.47).  Then 

r^(R,W,)  =  V0o(so),Vs0eS. 

Furthermore,  the  function  V*  :  S  — »  [0, 1]  defined  as  V*(so)  :=  rf(R,  W'),  .vo  G  S  satisfies  the  fixed- 
point  equation  (4.45). 

Proof.  It  can  be  observed  from  equation  (4.42)  that,  for  any  fixed  infinite  horizon  policy  p  = 
(po,pi, ...)  G  -JFa  and  y  =  (yo,  yi, ...)  G  Tj,,  the  following  inequality  holds: 

r%y(R,W')  >  r^’70^,  Ws0  G  S,  iV  G  N. 

This  implies  that,  for  every  sq  E  S  and  N  >  1,  we  have 

rZ(R-W>)  =  SUP  inf  tf^(R,W') 

>  sup  inf  r^N-y^N(R,W') 

=  r*{R,W'). 
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It  then  follows  that 


rZfrW)  >  lim  r»(R,W')  =  V«(s0),  Vs0  e  5. 

u  7V->oo  u 

In  order  to  prove  the  reverse  inequality,  we  define  the  functions  :  S  — >  [0, 1]  by 

Jk^k)  ■■=  r^Njk^N(R,W%  sk  G  S , 
for  /i  G  ,/#a,  y  G  rfc,  and  k  —  0, 1,  ...,1V  —  1. 

For  fixed  choices  of  Borel-measurable  functions  / :  S  —>  Ca  and  g  :  S  x  Ca  — >  Cb,  let  the  operator 
2Ffyg  be  defined  as  in  (4.17).  Then  by  Lemma  4.1,  for  any  finite  horizon  [0,1V],  /l  G  y  G  rft, 
the  functions  Jk^lN  can  be  computed  through  the  backwards  recursion 

=  k  =  —  1, 


initialized  with  J^N  —  1r- 

Furthermore,  by  Proposition  4.1,  there  exists  a  Borel-measurable  function  g*  :  S  x  Ca  — »  Cb 
which  satisfies  for  every  s  G  S  and  a  G  Ca  the  following  identity: 


g*(s,a)  G  arg  inf  H(s,a,b,V0 0); 

becb 


Thus,  for  any  Borel-measurable  function  /  :  S  — >  Ca  and  s  e  S,  we  have 


^,g*(V~)(s)  =  lR(s)  +  lw>\R(s)H(s,f(s),g*(s,f(s)),V<„) 

=  inf  1* (5)  +  (s)H(s,  fty&Vc*) 

becb  ' 

<  ^(Ko )(s). 


Consider  a  stationary  Markov  strategy  7*  :=  (g*,g*,...).  We  prove  the  following  claim  by 
backwards  induction  on  k:  for  any  /1  =  (jUo,jUt, ...)  G  and  IV  >  1, 

J^N<V-,  k  =  0,l,...,N. 

For  k  =  IV,  it  can  be  observed  that  J^'YrN  —  1r  <  Vk,.  For  the  inductive  step,  we  assume  the 
identity  holds  for  some  k  —  l  G  {1,  ...,1V}.  Then  for  any  /1  G  a ,  we  have 


u.  y* 

By  Proposition  4.7,  it  follows  that  Ji_x^,N  <  Vk,,  which  concludes  the  proof  of  the  claim. 
This  result  implies  that  for  every  so  G  .S’,  jl  G  and  yV  >  1 , 

W')  -  7^(s0)  <  V~(s0). 
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Taking  the  limit  as  N  — >  °<>,  it  follows  that 

lim  r^N'^N(R,W')  =  r^(R.W')  <  Voo(s0), 

N — ^°° 

for  every  so  G  S  and  /i  G  Thus, 

rl(R,W')  =  sup  inf  tff(R,W')  <  Ko(s0),  Ws0  G  5. 

ney/a^rh 


Combining  this  with  the  previous  inequality,  we  conclude  that 

r~(R,W,)  =  Voo(s0),  \/s0eS, 

and  hence  V*  =  14,.  By  another  application  of  Proposition  4.7,  we  have  that  V*  is  a  fixed-point  of 
the  operator  ST .  □ 

In  the  course  of  proving  Proposition  4.8,  we  have  also  shown  the  following. 

Corollary  4.1.  There  exists  a  stationary  worst-case  adversary  strategy.  In  particular,  if  y*  = 
(g*,g*,-)  e  rb  satisfies 

g*  (5,  a)  G  arg  inf  H  (s,a,b,Voo),  Vs  eW'\R,aeCa, 

beCh 

then  ~f  is  a  worst-case  adversary  strategy. 

In  contrast  with  the  finite  horizon  case,  the  existence  of  an  optimal  max-min  control  policy 
is  not  assured,  due  to  the  positive  non-discounted  payoff  structure.  The  following  example,  as 
adapted  from  Example  1  of  Kumar  and  Shiau  (1981)  provides  an  illustration  of  this  fact. 

Consider  a  Markov  decision  process  with  the  state  space  S  —  {<71 , <72, <73},  and  action  spaces 
Ca  =  [0, 1],  Q,  =  1,2.  The  states  q\  and  are  absorbing,  namely  x{q\\qi,a,b)  —  x(q2\q2,a,b)  =  1, 
Va  G  Ca,b  G  Q,.  In  state  <73,  if  player  I  chooses  a  G  [0, 1],  and  player  II  chooses  b  —  1,  then  the 
system  transitions  to  q\  with  probability  a  and  <72  with  probability  1  —  a\  on  the  other  hand,  if 
player  II  chooses  b  =  2,  then  the  system  transitions  to  q\  with  probability  1  —  a  and  remains  in  <72 
with  probability  a.  This  is  illustrated  in  Figure  4.2. 

Suppose  <71  is  a  target  state  and  <72  is  an  unsafe  state,  so  that  R  —  {<71 },  W'  =  {<71 ,  <73 }-  By  Propo¬ 
sition  4.8,  the  infinite  horizon  max-min  reach-avoid  probability  is  the  fixed  point  of  the  equation 
V*  =  ST(V*): 


V*(q)  =hai\(q)+  max  min 
ae  [0,1]  be  {1,2} 


1{93}(?)  ]£  V*{q')x{q'\q,a,b). 
q'ex 


Clearly,  V*(qi)  —  1  and  V*(q2)  —  0.  In  the  case  of  <73,  the  above  can  be  rewritten  as 


V*(<73)  =  max  min  T{qi\q3,a,b)  +  V*{q3)x{q3\q3,a,b). 
a€[0,l]ie{l,2} 
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Figure  4.2:  Markov  chain  example  to  illustrate  infinite  horizon  policies. 


It  can  be  verified  that  the  righthand  side  of  the  above  equation  is  given  by  2_y*(g3)’  which  results 
in  the  fixed  point  V*  (c/3)  =  1.  However,  there  does  not  exist  a  Markov  policy  for  player  I  which 
ensures  that  the  objective  of  reaching  q\  from  <73  while  avoiding  qi  can  be  achieved  with  probability 
one.  Specifically,  applying  Corollary  4.1,  we  have  that  a  worst-case  adversary  strategy  at  state  c/3  is 
given  by  y*  (73 ,  a)  —  2,  if  a  =  1 ,  and  y*  (73 ,  a)  —  1 ,  otherwise.  Under  this  choice  of  player  II  strategy, 
suppose  that  player  I  were  to  choose  ^4(73)  =  1,  Vk  >  0,  then  r^7*  ( R ,  W')  —  0.  On  the  other  hand, 
suppose  that  player  I  were  to  choose  jU/  (<73)  =  1  —  £,  for  some  /  >  0  and  £  >  0,  then  at  time  /,  the 
system  state  can  transition  from  <73  to  c/2  with  probability  £,  and  hence  (K.  K')  <  1  —  £.  This 
shows  that  there  does  not  exist  an  optimal  max-min  control  policy.  In  particular,  observe  that  the 
policy  p*(qf)  =  1  satisfies 

/I*(73)  G  arg  max  min  £  V^q'^q^.a.b). 
ae[0,l]Z>e{l,2}^5 

However,  as  shown  above,  choosing  this  as  a  stationary  policy  results  in  a  reach-avoid  probability 
strictly  less  than  V*  (<73)  (in  fact,  zero).  There  does  exist,  however,  a  stationary  £-optimal  policy, 
namely  /4(</3)  =  1  —  £,  Vk  >  0. 

The  above  example  motivates  the  search  for  conditions  under  which  there  exists  £-optimal 
policies  for  player  I.  For  this,  we  enlarge  the  set  of  control  policies  to  those  of  the  form  p  = 
(Mo(so), Pi(so,si),P2(so,S2), •••)>  where  /4-  depends  upon  both  the  current  state  ,s>  and  also  the 
initial  condition  sp.  These  policies  are  sometimes  referred  to  in  literature  as  semi-Markov  policies. 
The  following  result  is  an  extension  of  Proposition  9.20  in  Bertsekas  and  Shreve  (1978)  from  the 
single -player  case  to  the  zero-sum  game  case. 

Proposition  4.9.  For  every  £  >  0,  there  exists  an  e-optimal  semi-Markov  max-min  control  policy. 

Proof.  By  Proposition  4.8,  we  have  r^(/?,VF;)  =  lim^oo  (i?,  W'),  Vsq  G  S.  For  a  given  £  >  0, 
define  the  Borel  sets  Sf  C  S  by 

S‘N  =  {s  e  S :  rfyR.W')  >  rZ(R.W) -c} . 


107 


From  Lemma  4.5,  it  can  be  inferred  that  S|  C  5^  C  •  •  • ,  and  that  Ujv>i =  S.  By  Theorem  4.1, 
there  exists  a  Markov  policy  jlN  E  -/Ma  for  player  I  which  satisfies 

r%(R,W')  <  r^’y(R,W'),  Vs0  E  S,YE  Fb. 

Then  for  any  initial  condition  so  £S£n,  we  have  that 

rg’7(R,  W')  >  r~(R,  W')  —  £,  Vy  E  Fb. 

Now  consider  a  policy  flN  =  (/lN ,  jU,  jU, ...),  where  fl  :  S  — »  Ca  is  arbitrary,  then  flN  is  an  e-optimal 
policy  on  Sfj.  Defining  a  semi-Markov  policy  jl  by  jl  —  jl 1  on  Sf  and  jl  =  jl->  on  S£  \  S£_ , ,  j  >  2. 
Then  jl  is  the  required  policy.  □ 

In  practice,  one  can  implement  the  semi-Markov  policy  as  follows.  Suppose  that  one  would  like 
to  ensure  a  reach-avoid  probability  of  at  least  1  —  e  over  some  set  of  initial  conditions  So  C  S.  Then 
one  can  perform  a  finite  horizon  reach-avoid  calculation  until  a  time  instant  N  such  that  So  C  Sjy, 
and  apply  the  finite  horizon  optimal  control  policy  jlN  on  So-  In  the  previous  example,  let  Vn(sq)  := 
r^0(R,W'),  Vso  £  S,  then  it  can  be  verified  that  V\{q3)  —  \  and  Vk+\  (<73)  =  2 -vk{q-$)  f°r  ^  ^  1- 
Furthermore,  the  finite  horizon  optimal  policy  for  player  I  takes  the  form  [dN  =  (fdjj 
where  (<73)  =  VN_k(q3).  For  a  given  £  >  0,  one  could  then  choose  an  integer  N  sufficiently  large 
such  that  VN(q3)  >  1  —  e  and  implement  the  policy  fiN  on  q3. 

By  the  relation  between  the  probabilistic  safety  and  reach-avoid  problems  as  observed  in  sec¬ 
tion  4.3,  the  infinite  horizon  dynamic  programming  results  for  the  safety  problem  can  be  derived  in 
an  analogous  fashion  as  in  Proposition  4.7  and  4.8,  except  replacing  the  sup-inf  dynamic  program¬ 
ming  operator  E7  by  an  inf-sup  dynamic  programming  operator.  More  specifically,  let  W  E  38 (S) 
be  a  safe  set,  then  the  infinite  horizon  safety  probability  under  fixed  choices  of  /1  E  -/Ma  and  y  E  Tb 
is  given  by 

p^(W)  :=  Pt0J({(s0,su ...)  :  sk  EW,  Wk>  0})  (4.48) 

=  1  -<r({(50,st,  •••) :  3k  >  0,  (sk  eS\W)}) 

=  1  -r^r(S\W,S), 

where  r ^  is  as  defined  in  (4.41).  The  infinite  horizon  max-min  safety  probability  is  then  given  by 
P;0(W)  :=  sup  inf  p?/(W)  =  1  -r%(S\W,S),  s0  E  S.  (4.49) 

ney/ay^rb 

where 

r*(R,W')  :=  inf  sup  r^y(R,W'),  s0  E  S ,  R,W'  E  38[S).  (4.50) 

p.e.JtayeYh 

The  safety  problem  then  becomes  one  of  computing  the  infinite  horizon  minimal  reach-avoid  prob¬ 
ability  r*  (R,  W')  with  R  =  S\W  and  W'  =  S,  and  finding  an  optimal  control  policy  fi*  E  -///a  as 
interpreted  in  terms  of  a  Stackelberg  solution  to  (4.49). 
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Now  consider  the  finite  horizon  minimal  reach-avoid  probability  defined  as 
%(R,W'):=  inf  sup  r^N^(R,W% 

Yo^NeTb 

and  an  inf-sup  dynamic  programming  operator  defined  as 

#( J){s)  —  inf  sup  li?(s)  +  lW’\R(s)H(s,a,b,J),  s  G  S,  J  G  TP . 
aeC«  becb 

The  following  results  can  be  then  shown  using  an  analogous  procedure  as  given  in  the  proofs 
of  Proposition  4.7  and  4.8. 

Proposition  4.10.  Let  VU  :  S  — >  [0, 1]  be  defined  as  VTGo)  =  lim,v  — ^oo  r*(R,W')  andV*:S^[  0,1] 
be  defined  as  V*(so)  :=  r*0(/?,  W7).  Then 

(a)  V*  =  #( V *); 

(b)  V *  =  Voo,’ 

(c)  There  exists  a  stationary  optimal  control  policy.  In  particular,  if  p*  =  (/*,/*,...)  G  -/£a  satis¬ 
fies 

f*(s)  G  arg  inf  sup  H(s,a,b,V0 0),  Ws  &W'\R, 
aeCfl  beCh 

then  p*  is  an  optimal  control  policy. 

The  infinite  horizon  safety  probability  can  be  then  derived  from  this  proposition  by  setting 
R  —  S  \  W  and  W'  =  S.  It  should  be  noted,  however,  that  if  the  noise  distribution  in  the  DTSHG 
model  has  infinite  support,  this  probability  will  be  in  general  zero  everywhere.  Namely,  given 
enough  time,  the  system  trajectory  will  eventually  become  unsafe.  On  the  other  hand,  if  one  were 
to  consider  noise  distributions  with  bounded  support  or  alternative  interpretations  of  the  safety 
problem  as  the  probability  of  reaching  a  safe  set  before  reaching  the  unsafe  set,  for  example  by 
choosing  W'  in  (4.50)  as  a  strict  subset  of  S  (Hu  et  al.,  2005),  then  it  may  be  the  case  that  the  infinite 
horizon  safety  probability  would  no  longer  be  identically  zero  and  as  such  would  be  meaningful  to 
compute. 

4.7  Computational  Examples 

In  this  section,  we  will  illustrate  probabilistic  reachability  computation  for  DTSHG  models  through 
two  application  examples.  In  particular,  the  examples  are  stochastic  game  formulations  of  the 
aircraft  conflict  resolution  and  quadrotor  target  tracking  examples  considered  previously  in  chapter 
3  within  the  context  of  deterministic  hybrid  system  models.  The  discussion  here  will  focus  on  the 
motivations  for  stochastic  models,  the  computation  of  the  max-min  probability  and  control  policy, 
as  well  as  the  interpretation  of  the  dynamic  programming  solutions  in  terms  of  the  application  of 
interest. 
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4.7.1  Aircraft  Conflict  Detection  and  Resolution 


First  consider  the  problem  of  two  aircraft  conflict  resolution.  As  described  previously  in  section 
3.2,  the  relative  position  and  heading  dynamics  between  the  two  aircraft  in  the  conflict  scenario 
can  be  abstracted  in  terms  of  a  deterministic,  nonlinear  kinematics  model,  with  the  input  of  aircraft 
1  as  the  control  and  the  input  of  aircraft  2  as  the  disturbance.  A  source  of  uncertainty  which  is 
not  captured  in  this  model,  however,  is  the  effects  of  wind,  which  can  cause  significant  trajectory 
tracking  errors.  Such  effects  are  difficult  to  model  deterministically  as  they  tend  to  exhibit  large 
fluctuations  from  one  scenario  to  another.  Thus,  they  are  often  characterized  empirically  through 
statistical  analysis  of  aircraft  trajectory  data  compiled  over  a  large  number  of  flights  (Ballin  and 
Erzberger,  1996).  This  motivates  the  consideration  of  a  probabilistic  model  of  wind  to  augment 
the  aircraft  kinematics  model. 

The  field  of  conflict  detection  and  resolution  in  air  traffic  management  features  a  large  number 
of  formulations  and  computational  methods.  For  a  comprehensive  survey,  the  interested  reader 
is  referred  to  Kuchar  and  Yang  (2000).  Our  approach  to  this  problem  lies  at  the  intersection  of 
worst-case  (Tomlin  et  al.,  2002)  and  probabilistic  methods  (Paielli  and  Erzberger,  1997),  namely 
the  intent  of  one  of  the  aircraft  is  assumed  to  be  unknown  and  possibly  adversarial,  while  the  wind 
effects  on  aircraft  trajectory  is  modelled  as  stochastic  noise.  In  this  context,  conflict  detection  and 
resolution  becomes  a  probabilistic  safety  problem  in  which  the  control  task  is  to  maximize  the 
probability  of  avoiding  a  collision  between  two  aircraft. 

We  will  briefly  review  some  previous  work  in  probabilistic  conflict  detection  and  resolution. 
One  of  the  seminal  works  in  this  area  is  that  of  Paielli  and  Erzberger  (1997),  in  which  a  model 
for  aircraft  trajectory  perturbation  as  Gaussian  noise  was  proposed,  based  upon  the  statistical  anal¬ 
ysis  described  in  Ballin  and  Erzberger  (1996).  This  is  accompanied  with  an  analytic  method  for 
computing  the  conflict  probability.  This  formed  the  basis  of  several  probabilistic  conflict  detection 
methods  which  followed  (Prandini  et  al.,  2000;  Hwang  and  Seah,  2008).  As  more  detailed  trajec¬ 
tory  models  are  considered,  with  variations  to  aircraft  intent  (Yang  and  Kuchar,  1997)  and  spatial 
correlation  in  wind  effects  (Hu  et  al.,  2005),  closed-form  expressions  for  the  conflict  probability 
is  often  no  longer  available,  requiring  the  use  of  numerical  estimation  algorithms.  In  comparison 
with  these  previous  methods,  our  approach  has  the  flexibility  of  being  able  to  treat  uncertainty  in 
intent  as  an  adversarial  input  rather  than  as  a  stochastic  process,  thus  offering  an  interpretation  of 
the  conflict  probability  we  compute  as  the  probability  of  collision  under  the  worst-case  behavior 
of  one  of  the  aircraft. 

To  formulate  the  problem  more  precisely,  let  x  =  (xr,yr,  0r)  €  S  =  R2  x  [0.  2k]  denote,  respec¬ 
tively,  the  ^-position,  y-position,  and  heading  of  aircraft  2  in  the  reference  frame  of  aircraft  1.  By 
performing  an  Euler  discretization  of  the  kinematics  equations  given  in  section  3.2  and  augmenting 
the  resulting  dynamics  with  a  stochastic  wind  model  as  described  in  Hu  et  al.  (2005),  we  obtain 
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the  following  model  for  the  relative  motion  between  two  aircraft: 


x(k+  1)  —f(x(k),  (0\{k ),  ©2 (k))  +  w(k) 


xr[k )  +  At(—  vi  +  V2cos(0,-(k))  +  (0\{k)yr{k)) 

w\{k) 

yr(k )  +  At(v2  sin(0,.(k))  —  (0\{k)xr{k )) 

+ 

w2(k) 

0r(k)  +  At(tth(k)  —  ft)i(k)) 

_  w3{k)  _ 

where  At  is  the  discretization  step,  v,  is  the  speed  of  aircraft  i  (assumed  to  be  constant),  ft),  is  the 
angular  turning  rate  of  aircraft  i,  taken  to  be  the  inputs  to  the  system.  The  random  variables  (vv’  i ,  w2) 
models  spatially  correlated  wind  effects,  with  a  Gaussian  distribution  (wi,W2)  rs_/  JV  (0  ,E(*r,yr)), 
as  per  the  wind  model  proposed  in  Hu  et  al.  (2005).  In  particular,  at  each  planar  position  (xr,yr)  G 
R2.  the  stochastic  wind  component  in  the  stochastic  differential  equation  (SDE)  model  described 
in  Hu  et  al.  (2005)  has  the  distribution  OildB(xriyrit)  in  which  B  is  a  position-dependent  Brownian 
motion  and  O'/,  is  a  positive  constant.  It  is  shown  that  the  wind  in  relative  coordinates  has  the 
distribution 


w\(t)  =  ahy/2(l  - h(\\(xr,yr)\\))Wi(t) 

W2O)  =  ohsj2(l-h{\\(xr,y,)\\))W2(t) 

where  h  :  M  — »  M  is  a  continuous  decreasing  function  with  h( 0)  =  1  and  limc  _rcXjh(c)  —  0  and 
W (/)  —  (W\  {l).W2{l))  is  a  standard  Brownian  motion.  The  function  h  is  referred  to  as  the  spatial 
correlation  function  and  is  chosen  to  be  h(c)  —  exp(— /3c),  where  /3  is  a  positive  constant.  The 
distribution  of  ( w  \ ,  w2 )  in  (4.51)  is  then  obtained  through  an  approximation  of  this  distribution 
over  one  discretization  step  At.  Finally,  the  random  variable  W3  models  process  noise  acting  on  the 
turning  rate  of  either  aircraft.  It  is  assumed  to  have  a  Gaussian  distribution  W3  ~  jY (0,  (o^At)2). 

As  consistent  with  common  flight  maneuvers,  we  consider  a  scenario  in  which  each  aircraft  is 
allowed  to  select  from  among  one  of  three  operation  modes:  straight  flight,  right  turn,  or  left  turn, 
corresponding  to  the  angular  turning  rates  ft),  =  0,  ft),  =  —ft),  and  ft),  =  ft),  respectively.  Here,  ft)  e  R 
is  assumed  to  be  a  constant.  The  control  objective  of  aircraft  1  is  to  avoid  a  disc  D  of  radius  Rc 
centered  on  the  origin  in  the  ( xr:yr )  plane  (corresponding  to  a  loss  of  minimum  separation),  subject 
to  the  worst-case  inputs  of  aircraft  2.  This  can  be  then  viewed  as  a  probabilistic  safety  problem  with 
the  safe  set  given  as  W  =  Dc  x  [0,  2tt]  .  By  the  results  of  section  4.4.3,  the  solution  to  this  problem 
can  be  obtained  from  a  complementary  reach-avoid  problem  in  which  the  objective  of  aircraft  1 
is  to  minimize  the  worst-case  probability  of  entering  the  collision  set  S  \  W,  corresponding  to  the 
minimal  reach-avoid  probability  r*  (S  \  W.  S). 

For  our  numerical  results,  we  choose  a  sampling  time  of  At  =  15  seconds,  with  a  time  horizon 
of  2.5  minutes.  The  radius  of  the  protected  zone  is  set  to  Rc  —  5  nmi;  the  aircraft  speed  is  set  to 
vi  =  V2  =  6  nmi  per  minute;  and  the  angular  turning  rate  is  set  to  ft)  =  1  degree  per  second.  The 
parameters  of  the  probability  distributions  are  chosen  as  ah  —  0.5,  ow  —  0.35,  and  /3  =0.1.  The 
value  function  is  computed  using  a  numerical  discretization  approach,  similar  to  the  one  discussed 
in  Abate  et  al.  (2007),  on  the  domain  [—10,20]  x  [—10, 10]  x  [0,27t],  with  a  grid  size  of  121  x  81  x 
73. 


Ill 
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Conflict  Probability 


(a)  Set  of  so  with  conflict  probability  >1%  (b)  Contours  of  r*Q  ( S  \  W ,  S)  at  0,  =  n/2  radians 


Figure  4.3:  Probability  of  conflict  for  stochastic  game  formulation  of  pairwise  aircraft  conflict 
resolution  example. 


The  set  of  initial  conditions  sq  for  which  the  conflict  probability  is  at  least  1%  (namely,  where 
r*0(5\VF,5)  >  0.01)  is  shown  in  Fig.  4.3a.  Outside  of  this  set,  we  have  a  confidence  level  of 
at  least  99%  of  avoiding  a  collision  over  a  2.5  minute  time  interval.  A  slice  of  the  worst-case 
conflict  probability  r*Q(S  \  W)S)  at  a  relative  heading  of  Br  —  n/2  rad  is  shown  in  Fig.  4.3b.  In 
a  conflict  detection  and  resolution  algorithm,  one  can  use  this  probability  map  to  determine  the 
set  of  states  at  which  to  initiate  a  conflict  resolution  maneuver  (for  example  where  r*  exceeds  a 
certain  threshold),  upon  which  time  the  max-min  policy  jl*  provides  a  feedback  map  for  selecting 
flight  maneuvers  to  minimize  the  conflict  probability.  A  plot  of  this  policy  at  a  relative  heading  of 
Br  —  7t/2  rad  is  shown  in  Fig.  4.4.  As  can  be  observed,  when  the  two  aircraft  are  far  apart,  one 
can  choose  to  fly  straight  on  the  intended  course.  However,  as  aircraft  2  approaches  the  boundary 
of  the  set  shown  in  Fig.  4.3a,  it  becomes  necessary  for  aircraft  1  to  perform  an  evasive  maneuver 
(turn  right  for  the  upper  boundary,  turn  left  for  the  lower  boundary). 

4.7.2  Target  Tracking 

Now  consider  the  target  tracking  problem  in  which  the  task  specification  is  to  drive  an  autonomous 
quadrotor  helicopter  into  a  neighborhood  of  planar  positions  over  a  moving  ground  vehicle,  with¬ 
out  exceeding  certain  velocity  limits.  This  problem  was  previously  discussed  in  section  3.6  within 
a  continuous  time  robust  control  framework.  We  describe  here  a  stochastic  formulation  of  the 
problem  in  which  the  uncertainties  within  the  system  are  characterized  through  a  mixture  of  deter¬ 
ministic  bounds  and  stochastic  noise.  The  motivation  for  this  is  that  in  an  aerial  robotics  platform 
such  as  STARMAC,  the  effects  of  higher  order  dynamics  and  actuator  noise  can  often  be  difficult 
to  characterize  through  a  deterministic  model  (Huang  et  al.,  2009).  As  discussed  in  section  3.6, 
the  choice  of  the  disturbance  bounds  in  a  deterministic  setting  is  a  trade-off  between  robustness 
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Optimal  Flight  Maneuver  Selections 


Relative  x-Position  (nmi) 


Figure  4.4:  Max-min  control  policy  at  a  relative  heading  of  6r  =  Jt/ 2  rad.  The  color  scale  is  as 
follows:  Black  =  collision  set,  dark  gray  =  straight,  medium  gray  =  right  turn,  light  gray  =  left  turn, 
white  =  either  left  or  right  turn. 

and  feasibility.  In  a  robust  control  approach,  one  tends  to  put  conservative  bounds  on  the  effects 
of  these  disturbances,  thus  resulting  in  conservative  control  laws  or  sometimes  even  the  lack  of 
a  control  law  which  satisfies  the  desired  motion  planning  specifications.  To  alleviate  this  conser¬ 
vatism,  one  may  resort  to  disturbance  bounds  which  captures  the  “majority”  of  the  disturbance 
behaviors  observed  in  practice.  This,  however,  introduces  the  risk  that  the  desired  specifications 
may  be  violated,  such  as  found  in  the  experimental  trials  performed  on  the  quadrotor  platform.  A 
probabilistic  approach  on  the  one  hand  provides  a  method  for  quantifying  this  risk,  using  a  prob¬ 
abilistic  model  of  the  noise,  while  on  the  other  hand  allows  for  a  relaxation  of  the  deterministic 
reachability  specifications. 

The  model  of  the  system  dynamics  is  obtained  through  a  discretization  of  the  continuous  time 
dynamics  described  in  section  3.6.  Specifically,  let  jq,  x2,  y i,  y2  denote  the  position  and  velocity 
of  the  quadrotor  relative  to  the  ground  vehicle  in  the  x— axis  and  y— axis,  respectively.  Then  from 
the  point  of  view  of  a  high-level  controller,  the  position-velocity  dynamics  of  the  quadrotor  in 
the  planar  x  and  y  directions  can  be  modeled  as  decoupled  double  integrator,  controlled  in  the  x- 
direction  by  the  roll  angle  (j)  and  in  the  v-dircction  by  the  pitch  angle  6  angle.  The  corresponding 
equations  of  motion  in  discrete  time  is  given  by 

A  t~ 

xi(k+  1)  =xi(k)  +  Atx2{k)  +  —  (gsin(0(fc))  +  dx(k))  +wi(k)  (4.52) 

x2{k+  1)  =x2{k)  +At(gsm(<j>(k))  +dx{k))  +  w2{k) 

At~ 

yi(k+  1)  =yi(k)  +  Aty2(k)  +  —(gsm(-G(k))  +  dy(k))  +  w3(k) 

y2{k+  1)  =y2{k)  +  At(gsm(—9(k))  +dy(k))  +  w4(k), 

In  the  above,  At  is  the  discretization  step,  g  is  the  gravitational  acceleration  constant,  and  dx  and 
dy  are  bounded  uncertainty  terms  corresponding  to  the  acceleration  of  the  ground  vehicle.  The 
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variables  w,-,  for  i  =  are  stochastic  uncertainty  terms  arising  from  unmodeled  dynamics 

and  actuator  noise.  The  noise  variables  are  modeled  using  a  Gaussian  distribution,  with  w,-  ~ 
jV (0,  ( (JjAt )2).  This  is  based  upon  a  simplifying  modeling  assumption  that  the  noise  acting  on  the 
quadrotor  dynamics  is  generated  by  the  sum  of  a  large  number  of  independent  variables,  in  which 
case  the  Central  Limit  Theorem  applies. 

Based  upon  experimental  trials,  the  bounds  for  the  acceleration  dx  and  dy  of  the  ground  ve¬ 
hicle  are  chosen  to  be  [—0.4, 0.4]  m/s2  corresponding  to  about  25%  of  the  maximum  allowable 
acceleration  of  the  quadrotor.  For  this  example  the  roll  and  pitch  commands  (j)  and  0  are  selected 
from  a  quantized  input  range  due  to  digital  implementation.  Specifically,  they  are  selected  from 
the  input  range  [—10°,  10°]  at  a  2.5°  quantization  step.  These  quantization  levels  can  be  viewed  as 
the  discrete  states  of  the  system,  similar  to  the  discrete  flight  maneuvers  of  the  previous  example. 

For  the  specification  of  the  reach-avoid  problem,  the  target  set  is  chosen  to  be  a  square-shaped 
hover  region  centered  on  the  ground  vehicle,  specified  in  (xi,X2)  coordinates  as 

Rx  =  [—0.2, 0.2 ]m  x  [—0.2, 0.2 ]m/s. 

The  safe  set  in  this  case  is  chosen  to  be  the  set  of  all  states  within  the  domain  of  interest  for  which 
the  relative  position  remains  within  a  desired  bound  and  a  desired  velocity  bound  is  satisfied, 
specified  in  (x\,X2)  coordinates  as 

W/  =  [- 1.2,1. 2]mx  [~l,l]m/s. 

The  corresponding  sets  in  Ry  and  Wy  in  (y  i ,  3-2 )  coordinates  are  chosen  identically  as  above.  The 
target  and  safe  sets  in  two  dimensions  are  then  defined  as  R  =  Rx  x  Ry  and  W'  =  W[  x  Wy  respec¬ 
tively.  Under  a  stochastic  game  formulation  of  the  motion  planning  problem,  the  objective  of  the 
quadrotor  (player  I)  is  to  reach  the  hover  region  R  within  finite  time,  while  staying  within  the  safe 
set  W' ,  subject  to  the  worst-case  acceleration  inputs  of  the  ground  vehicle  (player  II). 

Given  that  the  dynamics,  target  set,  and  safe  set  in  the  x  and  y  directions  are  decoupled  and 
identical,  the  problem  reduces  to  a  two  dimensional  probabilistic  reach-avoid  calculation  in  the 
position-velocity  space.  For  the  numerical  results  to  be  shown  here,  we  set  the  noise  variance 
to  (7;  —  0.4,  the  sampling  time  to  At  =  0.1s,  and  the  time  horizon  to  one  second  (N  =  10).  The 
disturbance  input  was  discretized  at  0.1  m/s2  intervals  for  numerical  computation.  The  numerical 
computation  is  performed  over  the  safe  set  W /,  on  a  grid  size  of  61  x  41,  using  a  similar  method  as 
in  the  preceding  example. 

The  max-min  probability  r*  (/?,  VU7)  of  satisfying  the  desired  motion  planning  objectives  is 
shown  in  Fig.  4.5a  over  the  safe  set  W'x.  The  corresponding  contours  of  this  probability  map 
are  shown  in  Fig.  4.5b,  with  the  target  set  Rx  in  the  center.  As  a  comparison,  we  also  plot  in 
the  same  figure  the  result  of  a  deterministic  reachability  calculation  from  Figure  3.6  of  section  3.6, 
characterizing  the  set  of  feasible  initial  conditions  under  the  assumption  that  the  noise  obeys  certain 
deterministic  bounds.  One  observation  is  that  the  deterministic  reach-avoid  set  as  computed  using 
the  Hamilton-Jacobi  methods  described  in  chapter  3  bears  striking  resemblance  to  the  contours 
of  the  probability  map  computed  using  the  integro-difference  equations  described  in  section  4.4, 
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(a)  Surface  plot  over  W'x 


Max-min  Reach-avoid  Probability 


(b)  Contour  plot  and  comparison  with  deterministic 
reach-avoid  set  (shown  in  dashed  outline) 


Figure  4.5:  Max-min  reach-avoid  probability  r*Q  (R.  IF')  for  quadrotor  target  tracking  example  with 
iV=10. 


despite  the  relative  large  noise  variance.  In  particular,  the  horizon- 10  reach-avoid  set  correspond 
roughly  to  the  0.8 -superlevel  set  of  the  probability  map  r*  (R.Wf). 

One  interpretation  of  the  results  shown  here  is  that  the  max-min  probability  is  a  quantification 
of  the  risk  of  violating  the  reach-avoid  specification,  if  the  unmodeled  dynamics  and  disturbances 
behave  statistically  according  to  some  noise  distribution.  Specifically,  under  the  given  stochastic 
noise  model  and  the  the  probabilistic  max-min  control  policy,  system  trajectories  initiated  from 
within  the  deterministic  reach-avoid  set  will  not  always  satisfy  the  reach-avoid  specification,  but 
rather  with  a  probability  of  80%.  Another  interpretation  is  that  the  probabilistic  formulation  is 
a  relaxation  of  the  deterministic  reachability  specification.  Namely,  under  the  unbounded  noise 
distribution  of  a  Gaussian  model,  it  is  impossible  to  synthesize,  using  the  deterministic  methods 
described  in  section  3.6,  a  control  policy  that  satisfies  the  reach-avoid  specification  with  probability 
one.  On  the  other  hand,  if  one  were  to  allow  the  specification  to  be  satisfied  with  a  certain  level  of 
confidence,  for  example  with  a  probability  of  80%,  then  the  max-min  control  policy  as  synthesized 
through  the  probabilistic  reachability  computation  provides  us  with  the  feedback  maps  needed  to 
enforce  such  a  probabilistic  specification. 

In  order  to  illustrate  the  form  of  the  max-min  control  policy,  as  well  as  to  investigate  the  infinite 
horizon  properties  of  the  reachability  computation  for  this  particular  example,  we  lengthen  the  time 
horizon  to  N  =  40.  The  resulting  max-min  probability  r™ (R.  W'),  along  with  the  feedback  map 
/Tq0  synthesized  from  this  computation  is  shown  in  Figure  4.6.  In  this  case,  it  was  found  that 
the  reachability  computation  indeed  exhibits  a  convergence  behavior.  In  fact,  over  the  entire  safe 
set  IF',  the  difference  between  successive  applications  of  the  dynamic  operator  2F  at  N  —  40  was 
found  to  be  no  more  than  4.9  x  10-5  (about  0.005%).  The  probability  map  r®(R,Wr)  can  be  then 
interpreted  as  an  approximation  to  the  infinite  horizon  reach-avoid  probability.  As  described  in 
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Max-min  Reach-avoid  Probability 


Optimal  Control  Inputs 


-1  -0.5  0  0.5  1 

Position  (m) 

(b)  Control  policy  map  /r(j° 


Figure  4.6:  Max-min  reach-avoid  probability  and  control  policy  for  quadrotor  target  tracking  ex¬ 
ample  with  N  =  40. 


section  4.6,  the  value  functions  produces  through  this  dynamic  programming  procedure  can  be  also 
used  to  synthesize  a  max-min  control  policy  over  W'  that  is  approximately  optimal  with  respect  to 
the  infinite  horizon  payoff.  Specifically,  /i(j°  as  shown  in  Figure  4.6  is  the  first  feedback  map  in 
this  control  policy,  to  be  applied  at  the  first  time  instant  k  =  0.  It  is  interesting  to  observe  that  this 
control  policy  has  the  form  of  a  switching  control  policy.  Namely,  over  large  portions  of  W1 ,  the 
optimal  control  choice  is  bang-bang.  On  the  other  hand,  nearing  the  safety  constraints  of  W' ,  the 
control  law  chooses  an  input  of  zero  in  order  to  prevent  constraint  violation.  This  correlates  with 
the  intuition  that  the  control  policy  for  a  reach-avoid  problem  has  the  characteristics  of  a  minimum 
time  to  reach  control  law,  which  was  also  observed  experimentally  in  the  results  of  section  3.6. 
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Chapter  5 


Partial  Information  in  Probabilistic 
Reachability  Problems 

5.1  Overview  and  Related  Work 

The  controller  design  methods  as  presented  in  the  preceding  chapters  are  based  upon  an  important 
assumption  that  the  discrete  and  continuous  states  of  the  hybrid  system  model  can  be  directly  mea¬ 
sured  or  observed.  This  is  in  fact  a  common  assumption  which  appears  in  much  of  the  literature  on 
hybrid  reachability  problems  (see  for  example  Maler  et  al.,  1995;  Lygeros  et  al.,  1 999b:  Asarin  et 
al.,  2000 b\  Shakernia  et  al.,  2001;  Aubin  et  al.,  2002;  Hwang  et  al.,  2005;  Koutsoukos  and  Riley, 
2006;  Gao  et  al.,  2007;  Abate  et  al.,  2008;  Tabuada,  2008;  Girard  et  al.,  2010;  Mohajerin  Esfahani 
et  al.,  2011),  and  can  be  reasonable  as  long  as  the  state  measurements  or  state  estimates  are  suffi¬ 
ciently  accurate  with  respect  to  the  reachability  specifications  of  interest.  However,  in  the  case  that 
the  measurements  or  estimates  exhibit  significant  uncertainties,  for  example,  due  to  limitations  of 
what  sensors  can  measure,  imprecision  in  the  sensor  output,  or  measurement  noise  induced  by  the 
operating  environment,  then  the  reachability  computation,  as  well  as  the  controller  synthesis  pro¬ 
cedure  would  need  to  account  for  the  effects  of  decisions  made  under  an  imperfect  representation 
of  the  true  system  state. 

In  this  chapter,  we  will  study  probabilistic  safety  and  reach-avoid  problems  within  the  context 
of  a  Partially  Observable  Discrete  Time  Stochastic  Hybrid  System  (POdtSHS),  which  augments 
the  perfect  information  DTSHS  model  proposed  in  Amin  et  al.  (2006)  and  Abate  et  al.  (2008)  with 
a  probabilistic  observation  model.  In  particular,  the  possible  outcomes  of  qualitative  observations 
and  quantitative  measurements  are  encapsulated  in  an  abstract  observation  space,  while  the  uncer¬ 
tainties  in  the  observed  information  are  modeled  in  terms  of  a  conditional  probability  distribution 
of  the  observations  given  the  hybrid  state.  This  distribution  can  be  derived  either  from  statistical 
analysis  of  empirical  data  or  from  statistical  assumptions  upon  the  underlying  noise  or  disturbance, 
in  a  similar  manner  as  the  modeling  of  transition  probabilities.  Comparing  with  the  DTSHG  model 
of  the  preceding  chapter,  we  neglect  the  game  theoretic  aspect  of  the  reachability  problem  in  order 
to  focus  the  discussion  on  issues  related  to  partial  observability. 
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The  consideration  of  an  imperfect  observation  model  inevitably  brings  up  the  problem  of  state 
estimation.  With  insight  gained  from  classical  analysis  of  linear  and  nonlinear  systems  (see  for 
example  Callier  and  Desoer,  1991;  Sastry,  1999),  it  can  be  inferred  that  the  problem  of  hybrid 
estimation  is  dual  to  the  problem  of  hybrid  control.  In  fact,  hybrid  estimation  suffers  from  much 
of  the  same  difficulties  as  hybrid  control,  for  example  nonlinear  filtering,  estimation  of  switching 
times,  and  estimation  of  continuous  state  under  switching  dynamics.  In  the  realm  of  discrete  state 
or  discrete  event  systems,  partial  observability  can  be  modeled  as  an  output  map  from  either  the 
discrete  state  space  or  the  discrete  event  space  to  an  observation  space  (Ramadge,  1986;  Caines  et 
al.,  1988;  Ozveren  and  Willsky,  1990).  Concepts  of  observability  are  then  formulated  in  terms  of 
the  distinguishability  of  the  initial  condition  or  the  current  state,  given  the  sequence  of  observa¬ 
tions,  much  in  the  same  way  as  continuous  state  observability.  Algorithms  for  estimating  discrete 
states,  such  as  given  in  Caines  et  al.  (1988)  and  Ozveren  and  Willsky  (1990),  typically  involve 
maintaining  a  set  of  discrete  states  that  are  compatible  with  the  sequence  of  observations.  Clearly, 
if  this  set  converges  to  a  singleton,  then  this  singleton  corresponds  to  the  exact  system  state. 

With  respect  to  deterministic  hybrid  systems,  the  study  of  observability  and  state  estimation 
has  largely  focused  on  the  class  of  linear  systems  with  switching  dynamics.  When  the  switching 
behavior  is  assumed  to  be  known  ahead  of  time,  then  the  system  under  analysis  becomes  a  spe¬ 
cial  class  of  linear  time-varying  systems.  Observability  conditions  for  such  class  of  systems  can 
be  formulated  through  appropriate  specialization  of  results  from  the  study  of  linear  time-varying 
systems  (Ezzine  and  Haddad,  1988;  Szigeti,  1992).  In  the  case  that  the  switching  input  is  con¬ 
trolled,  then  the  estimation  problem  can  be  considered  dual  to  the  switching  control  problem,  and 
some  conditions  are  given  in  Sun  et  al.  (2002)  for  the  existence  of  a  switching  input  to  render  the 
system  observable.  The  work  by  Vidal  et  al.  (2003)  considers  a  scenario  in  which  the  switching 
input  is  assumed  to  be  an  unknown  piecewise  constant  function,  and  gives  necessary  and  sufficient 
conditions  for  observability,  expressed  in  terms  of  matrix  rank  conditions.  When  the  switching  be¬ 
havior  is  autonomous,  and  the  switching  boundaries  are  assumed  to  be  described  by  hyper-planes, 
then  the  system  is  classified  as  a  piecewise  linear  or  piecewise  affine  system.  In  the  early  work 
of  Sontag  (1981),  a  sufficient  condition  is  given  for  the  existence  of  a  observer  which  uniquely 
determines  the  state  of  a  discrete  time  piecewise  linear  system  after  a  finite  number  of  time  steps. 
From  a  computational  perspective,  an  algorithm  is  provided  in  Bemporad  et  al.  (2000c/)  for  check¬ 
ing  the  observability  of  a  discrete  time  piecewise  affine  systems  using  the  solution  a  mixed-integer 
linear  program.  Within  a  continuous  time  setting,  the  model  of  piecewise  affine  hybrid  systems  is 
considered  in  Collins  and  van  Schuppen  (2004),  and  sufficient  observability  conditions  are  given, 
along  with  procedures  for  constructing  observers. 

In  the  case  of  a  stochastic  hybrid  system,  the  output  trajectory  corresponding  to  a  given  ini¬ 
tial  condition  can  vary  from  one  execution  to  another,  either  due  to  process  noise  or  measurement 
noise.  Thus,  the  concept  of  observability  does  not  generalize  in  a  straightforward  manner  from  the 
analysis  of  deterministic  systems.  Some  efforts  towards  probabilistic  notions  of  observability,  how¬ 
ever,  can  be  found  in  Hwang  et  al.  (2003)  and  Costa  and  do  Val  (2003),  within  the  context  of  linear 
systems  with  Markov  discrete  transitions.  The  investigation  into  hybrid  estimators  for  stochastic 
systems  has  its  origins  in  deriving  optimal  estimators  for  linear  Gaussian  systems  with  a  constant 
parameter  vector  taking  values  within  a  finite  set  (see  for  example  Magill,  1965;  Lainiotis,  1971; 
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Maybeck,  1982),  as  motivated  by  applications  in  fault  detection,  target  tracking,  and  adaptive  con¬ 
trol.  In  generalizing  this  scenario  to  time-varying  parameters,  Ackerson  and  Fu  (1970)  proposed  a 
class  of  linear  Gaussian  models  in  which  the  noise  parameters  are  allowed  to  make  discrete  transi¬ 
tions  according  to  a  finite  state  Markov  chain.  This  definition  later  evolved  to  encompass  variations 
in  the  system  matrices,  and  became  known  as  Jump  Linear  Systems  (JLS).  As  noted  in  Ackerson 
and  Fu  (1970),  the  optimal  estimator,  with  respect  to  minimum  mean  square  error  (MMSE),  is  in 
fact  a  weighted  sum  over  an  exponentially  growing  set  of  Kalman  filters,  corresponding  to  the  set 
of  all  possible  switching  sequences.  Various  suboptimal  filtering  schemes  have  been  since  pro¬ 
posed  (a  thorough  review  of  work  prior  to  1982  can  be  found  in  Tugnait  (1982)).  Most  notably,  the 
Interacting  Multiple  Model  (IMM)  algorithm,  as  proposed  in  Blom  and  Bar-Shalom  (1988),  has 
found  significant  successes  in  target  tracking  applications  (Bar-Shalom  and  Li.,  1993).  Extensions 
of  JLS  estimation  algorithms  to  semi-Markov  models  with  probabilistic  sojourn  times  are  dis¬ 
cussed  in  Campo  et  al.  (1991)  and  Petrov  and  Zubov  (1996).  Within  the  hybrid  systems  literature, 
an  alternative  approach  to  the  hypothesis  merging  procedure  in  IMM,  based  upon  A*  search  over 
the  set  of  possible  discrete  trajectories,  is  discussed  in  Hofbaur  and  Williams  (2002).  Furthermore, 
a  generalization  of  the  IMM  algorithm  has  been  proposed  in  Seah  and  Hwang  (2009)  for  linear 
Gaussian  models  whose  discrete  transitions  are  governed  by  by  stochastic  guard  conditions.  Fi¬ 
nally,  as  alternatives  to  the  traditional  Kalman  filtering  algorithms,  sampling-based  methods  such 
as  Markov  Chain  Monte  Carlo  algorithms  (Doucet  et  al.,  2000)  and  particle  filtering  algorithms 
(Koutsoukos  et  al.,  2003;  Blom  and  Bloem,  2007)  have  also  been  applied  to  various  models  of 
stochastic  hybrid  systems. 

For  control  problem  formulated  in  a  partial  information  setting,  the  design  of  a  feedback  policy 
needs  to  address  the  following  questions: 

•  What  is  the  information  needed  for  the  control  task  at  hand? 

•  How  can  we  construct  this  information  from  the  history  of  inputs  and  outputs? 

•  How  do  we  use  this  information  for  control  selection? 

The  first  two  questions  relate  to  the  estimation  aspect  of  the  problem,  while  the  last  question  relate 
to  the  control  aspect  of  the  problem.  In  the  case  of  a  stability  or  regulation  problem,  what  is  needed 
from  an  estimation  perspective  is  a  convergent  estimator.  For  deterministic  systems,  the  existence 
of  such  an  estimator  is  assured  by  sufficient  conditions  for  observability.  This  is  one  of  the  reasons 
that  much  of  the  studies  in  the  deterministic  case  has  focused  on  finding  such  conditions.  The 
practical  construction  of  a  convergent  estimator,  however,  depends  on  issues  of  implementation. 
In  the  case  of  a  discrete  state  system,  the  estimation  algorithm  as  proposed  in  Caines  et  al.  (1988) 
was  used  in  Caines  and  Wang  (1989)  to  design  a  dynamic  observer,  along  with  a  control  law  for 
using  the  discrete  state  estimates  to  drive  the  system  trajectory  to  a  target  location.  In  the  case 
of  linear  systems,  a  common  design  for  convergent  observers  is  that  of  a  Luenberger  observer 
(Luenberger,  1971),  which  has  been  extended  by  Alessandri  and  Coletta  (2001)  and  Balluchi  et 
al.  (2002)  to  design  stabilizing  controllers  for  hybrid  systems  with  linear  dynamics.  For  stochastic 
optimal  control  problems  with  additive  cost  functions,  it  has  been  shown  that  the  estimator  needed 
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for  optimal  control  selection,  called  a  sufficient  statistic,  is  characterized  by  a  set  of  Bayesian 
filtering  equations  which  produces  as  its  output  the  conditional  distribution  of  the  system  state  at 
each  time  step  (see  for  example  Bertsekas  and  Shreve,  1978;  Kumar  and  Varaiya,  1986).  Under 
mild  technical  assumptions,  the  expected  value  with  respect  to  this  distribution  in  fact  coincides 
with  the  MMSE  estimate,  which  is  again  a  contributing  reason  for  the  study  of  MMSE  estimators, 
or  approximations  thereof,  in  stochastic  estimation. 

Within  the  context  of  a  POdtSHS  model,  probabilistic  safety  and  reach-avoid  problems  are 
partial  information  stochastic  optimal  control  problems  with  multiplicative  or  sum-multiplicative 
payoffs.  Thus,  they  unfortunately  lie  beyond  the  common  classes  of  control  problems  as  described 
in  the  preceding  paragraph.  Perhaps  the  closest  relative  to  these  problems  in  the  optimal  control 
literature  is  the  partial  information  linear  exponential  Gaussian  (LEG)  problem,  whose  cost  is  an 
exponential  of  a  quadratic  function,  and  hence  multiplicative  (see  for  example  Speyer  et  al.,  1974; 
Whittle,  1981;  Kumar  and  van  Schuppen,  1981;  Fan  et  al.,  1994).  The  close  correlation  of  the 
structure  of  the  cost  with  the  form  of  the  Gaussian  distribution  allows  for  the  derivation  of  an¬ 
alytical  solutions  (which  is  again  not  the  case  for  probabilistic  reachability  problems  due  to  the 
indicator  functions  appearing  in  the  payoff).  Nonetheless,  as  shown  in  Whittle  (1981),  the  optimal 
estimate  for  the  LEG  problem  in  fact  depends  on  the  parameters  of  the  cost  function,  while  the 
optimal  control  law  depends  on  the  parameters  of  the  noise  distribution.  This  provides  an  example 
of  a  partial  information  optimal  control  problem  in  which  the  type  of  certainty  equivalence  prin¬ 
ciple  found  in  linear  quadratic  Gaussian  (LQG)  problems  does  not  apply.  Thus,  as  one  considers 
non-traditional  forms  of  cost  structure,  there  is  a  need  for  foundational  understanding  of  the  types 
of  estimation  issues  mentioned  previously. 

Before  stating  our  main  results,  we  will  briefly  review  some  previous  work  on  partial  informa¬ 
tion  reachability  problems.  For  discrete  event  systems,  Cieslak  et  al.  (1988)  and  Lin  and  Wonham 
(1988)  considered  problems  of  constructing  supervisory  controllers  to  satisfy  language  specifi¬ 
cations  under  partial  observations.  The  control  objectives  in  these  problems  can  be  viewed  as 
reachability  specifications  through  appropriate  interpretation  of  the  language  of  a  discrete  event 
system  under  a  supervisor  as  the  closed-loop  system  behavior.  Computational  complexity  issues 
of  two-player  discrete  reachability  games  with  partial  information  is  discussed  in  Reif  (1984).  In 
particular,  it  is  shown  that  the  problem  of  determining  the  existence  of  a  winning  strategy  for  one  of 
the  players  is  in  general  EXPTIME-complete.  An  algorithm  for  the  synthesis  of  winning  strategies 
for  two-player  reachability  games  on  graphs  is  proposed  in  Chatterjee  et  al.  (2006),  and  is  shown 
to  achieve  the  EXPTIME  bound  in  worst-case.  Similar  complexity  results  have  been  obtained  for 
stochastic  reachability  problems  in  probabilistic  models  of  two-player  games  with  discrete  state 
spaces  and  partial  information  (Alur  et  al.,  1995;  Bertrand  et  al.,  2009;  Gripon  and  Serre,  2009; 
Chatterjee  et  al.,  2010).  Within  the  class  of  hybrid  systems  referred  to  as  linear  hybrid  automata, 
the  work  of  Henzinger  and  Kopke  (1999)  analyzed  the  complexity  of  partial  information  reacha¬ 
bility  problems,  and  controller  synthesis  algorithms  have  been  developed  in  Wong-Toi  (1997)  and 
De  Wulf  et  al.  (2006).  Due  to  the  simplicity  of  the  continuous  dynamics  in  each  discrete  state,  as 
described  by  a  differential  inclusion  restricted  to  a  convex  polyhedron,  the  techniques  for  analysis 
and  control  are  often  based  upon  extensions  of  methods  developed  for  discrete  state  systems.  For 
a  discrete  time  hybrid  system  model  in  which  the  discrete  state  is  observed,  but  the  continuous 
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state  measurement  contains  error,  Del  Vecchio  (2009)  proposes  a  method  for  designing  safety  con¬ 
trollers  based  upon  a  set-valued  state  estimate,  computed  under  order  preserving  assumptions  on 
the  continuous  dynamics.  The  work  described  in  Del  Vecchio  et  al.  (2009)  studies  the  continuous 
time  version  of  this  problem  and  provides  a  separation  principle.  The  problem  of  discrete  state 
estimation,  under  the  assumption  that  the  continuous  state  is  measured  without  error,  is  addressed 
in  Verma  and  Del  Vecchio  (2012).  To  the  best  of  our  knowledge,  the  results  given  here  are  some 
of  the  first  of  its  kind  on  partial  information  reachability  problems  for  a  general  class  of  stochastic 
hybrid  systems. 

In  this  chapter,  we  present  dynamic  programming  solutions  to  the  partial  information  proba¬ 
bilistic  safety  and  reach-avoid  problems  for  POdtSHS,  as  extensions  of  results  in  chapter  10  of 
Bertsekas  and  Shreve  (1978)  on  additive  cost  problems.  In  particular,  we  show  that  by  augment¬ 
ing  the  state  space  with  a  binary  random  variable,  the  safety  problem,  which  has  a  multiplicative 
payoff  structure,  is  equivalent  to  a  terminal  cost  problem  on  the  augmented  state  space  (section 
5.3).  The  results  of  Bertsekas  and  Shreve  (1978)  can  be  then  used  to  construct  a  sufficient  statistic 
in  terms  of  abstract  filtering  equations  which  update  a  conditional  distribution  of  the  augmented 
state.  This  distribution,  referred  to  as  an  information  state,  allows  us  to  derive  an  equivalent  perfect 
state  information  problem  and  a  dynamic  programming  algorithm  for  computing  the  optimal  safety 
probability  (section  5.4).  The  analysis  is  then  extended  to  the  reach-avoid  problem,  in  which  case 
it  is  shown  that  the  solution  is  an  additive  cost  dynamic  programming  algorithm  on  the  information 
state  space  (section  5.5).  We  then  state  several  consequences  of  these  results.  First,  it  is  shown  that 
in  the  case  of  perfect  information,  the  class  of  memoryless,  deterministic  control  policies  is  opti¬ 
mal  for  the  safety  problem  within  the  class  of  randomized  control  policies  with  memory,  despite 
the  multiplicative  payoff  (section  5.6).  This  provides  justification  for  the  restriction  of  attention  to 
such  policies  in  previous  work  on  perfect  information  probabilistic  reachability  problems  (Amin 
et  al.,  2006;  Abate  et  al.,  2008;  Summers  and  Lygeros,  2010).  Second,  we  consider  the  class  of 
Partially  Observable  Markov  Decision  Processes  (POMDPs),  which  can  be  viewed  as  POdtSHS 
models  with  discrete  state,  action,  and  observation  space  (section  5.7).  In  this  case,  the  filtering 
and  policy  computation  can  be  carried  out  on  an  augmented  state  space  with  twice  the  number  of 
discrete  states  as  the  original  model.  Third,  we  specialize  the  dynamic  programming  solution  to 
hybrid  system  models  with  probability  density  models,  in  which  case  the  sufficient  statistic  reduces 
to  a  set  of  Bayesian  update  equations  for  a  conditional  probability  density  over  an  augmented  hy¬ 
brid  state  space  (section  5.8).  The  practical  implementation  of  this  solution,  however,  depends  on 
the  existence  of  a  finite  dimensional  representation  for  the  conditional  density. 

5.2  Model  and  Problem  Formulation 

5.2.1  Partially  observable  Discrete  Time  Stochastic  Hybrid  System 

The  model  for  a  partially  observable  discrete  time  stochastic  hybrid  system  (POdtSHS)  augments 
the  DTSHS  model  proposed  in  Amin  et  al.  (2006)  and  Abate  et  al.  (2008)  with  an  observation  space 
and  a  stochastic  observation  model.  It  can  be  viewed  as  a  particular  instantiation  of  the  imperfect 
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state  information  model  given  in  Bertsekas  and  Shreve  (1978). 

Definition  5.1  (POdtSHS).  A  partially  observable  discrete  time  stochastic  hybrid  system  is  a  tuple 

=  (Q,n,Ca,Z,  vx,  vq,  vr,  Co,  0,  defined  as  follows. 

•  Discrete  state  space  Q  {<71,  <72,  m  G  N; 

•  Dimensions  of  continuous  state  space  n  :  Q  — >  N:  a  map  which  assigns  to  each  discrete  state 

q  G  Q  the  dimension  of  the  continuous  state  space  The  hybrid  state  space  is  given  by 

S:=lW«}x 

•  Control  input  space  Ca :  a  nonempty  Borel  space; 

•  Obsen’ation  space  Z:  a  nonempty  Borel  space; 

•  Continuous  state  transition  kernel  vx  :  -:Z x  S  x  Ca  — >  [0, 1]:  a  Borel-measurable 
stochastic  kernel  on  given  S  x  Ca,  which  assigns  to  each  s  =  ( q,x )  G  S  and  a  G  Ca  a 
probability  measure  vY(-| x,a)  on  the  Borel  space  (Mn{qk  fZ{Wn('q)))\ 

•  Discrete  state  transition  kernel  vq  :  Q  x  S  x  Ca  — >•  [0, 1]:  a  Borel-measurable  discrete  stochas¬ 
tic  kernel  on  Q  given  S  x  Ca,  which  assigns  to  each  ,v  G  S  and  a  G  Ca  a  probability  distribution 
vq(-\s,a )  over  2; 

•  Reset  transition  kernel  v,  :  3S(Wl{'])  x  S  x  Ca  x  Q  — >  [0, 1]:  a  Borel-measurable  stochastic 
kernel  on  W1^  given  S  x  Ca  x  (7.  which  assigns  to  each  ,v  G  5,  a  G  Cfl,  and  q'  G  (7  a  probability 
measure  Vr(-|5,a,^')  on  the  Borel  space 

•  Initial  observation  kernel  Co  :  --Z(Z)  xS-t  [0,1]:  a  Borel-measurable  stochastic  kernel  on 
Z  given  5,  which  assigns  to  each  5  G  S  a  probability  measure  Co('|s)  on  the  Borel  space 

(z.^(z)y, 

•  Obsen’ation  kernel  C  :  •-'Z(Z)  x  S  x  Cfl  — >  [0, 1]:  a  Borel-measurable  stochastic  kernel  on  Z 
given  5  x  Cfl, ,  which  assigns  to  each  ,v  G  S  and  a  G  Ca  a  probability  measure  CM s,a)  on  the 
Borel  space  (Z,  ^(Z)); 

The  definitions  for  an  abstract  observation  space  Z  and  observation  kernels  Co  and  C  are  based 
upon  the  abstract  model  for  an  imperfect  state  information  stochastic  optimal  control  problem  pre¬ 
sented  in  chapter  10  of  Bertsekas  and  Shreve  (1978).  The  generality  of  these  definitions  can  be  used 
to  treat  a  wide  range  of  observation  models  found  in  practice.  In  particular,  a  discrete  observation 
space  O  {01,02, and  a  Euclidean  observation  space  M"°  are  both  Borel  spaces.  Within 
this  context,  the  observation  kernel  C  can  be  interpreted  as  the  conditional  distribution  of  the  dis¬ 
crete  or  continuous  observations  given  a  hybrid  state  s  G  S  and  control  input  a  G  Ca.  By  the  analysis 
given  in  Davis  (1993),  we  also  have  that  the  hybrid  state  space  defined  by  Z  =  O  x 
where  n0  :  Q  N  is  the  dimension  of  the  continuous  observation  space  in  each  discrete  state,  is 
a  Borel  space.  In  this  case,  the  kernel  £  can  be  interpreted  as  the  joint  conditional  distribution 
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of  the  discrete  and  continuous  observations  given  the  hybrid  state  and  control  input.  As  a  conse¬ 
quence,  it  is  worth  noting  that  the  perfect  state  information  DTSHS  model  in  Amin  et  al.  (2006); 
Abate  et  al.  (2008)  can  be  considered  a  special  class  of  the  POdtSHS  by  specifying  Z  —  S  and 
£o(dz\s)  —  £{dz\s,a)  —  Ss,  where  8S  is  a  probability  measure  on  S  which  assigns  probability  mass 
one  to  the  point  s.  To  illustrate  this  modeling  framework,  consider  the  partially  observable  jump 
linear  system  shown  in  Figure  5.1. 


Figure  5.1:  Jump  linear  system  example  to  illustrate  POdtSHS  modeling  framework. 


Here  we  have  x  £  E”,  u  £  E"',  y  £  E"°,  w  £  Ww,  v  £  EWv,  and  A,,  B,.  Q,  G,.  Ht,  Ev,  Evv  are  matri¬ 
ces  of  appropriate  dimensions.  In  this  example,  the  hybrid  state  space  is  given  by  S  —  {q\  .c/2}  x  M", 
the  control  input  space  is  given  by  Ca  —  E"' ,  and  the  observation  space  is  given  by  Z  —  O  x  E'!" , 
where  O  —  {01,02}-  The  discrete  transition  kernel  vq  can  be  derived  as  vq(qJ\{ql.x).u)  —  pt,  if 
j  —  i,  and  vq (q 7j x),u)  =  1  —  pu  otherwise.  The  continuous  and  reset  transition  kernels  are 
described  as  vx(dx'\(qi,x),u)  —  v,  {dxl\(ql.x).u.qj)  ~  ■A/'(Aix  +  B,u.  GjZ^G-  ).  Finally,  the  ob¬ 
servation  kernels  are  given  by  C,a(o,dy\{q,x))  —  £( o,dy'\(q,x),u )  =  £0(o\q)Cx(dy\(q,x)),  where 
Cx(dy\(qi,x))  ~  JZ (CiX.  H-LvHj )  and  C o(pj\qi)  =  A,-,  if  j  =  i,  and  £ 0{oj\qi)  =  1  -  A,-,  otherwise. 

Under  a  PDTSHS  model,  the  available  information  at  each  time  step  k  is  the  observation  and 
input  history  (z(0),o(0),  ...,z(k  —  1  ),a(k—  l),z(k)),  along  with  the  probability  distribution  of  the 
initial  state  (g(0),x(0).  For  compactness  of  notation,  we  define  as  in  Bertsekas  and  Shreve  (1978) 
the  information  spaces 

4  =  Zk+l  x  C*  4  =  0,1,.... 

An  element  of  4  is  called  the  information  vector  at  time  step  k.  For  the  initial  state  distribution, 
we  denote  the  set  of  probability  measures  on  S  by  AP(S).  By  Corollary  7.25.1  of  Bertsekas  and 
Shreve  (1978),  £Z(S)  is  also  a  Borel  space.  To  keep  the  discussions  general,  we  consider  the  set 
of  randomized  control  policies  depending  on  the  initial  state  distribution  and  information  vector  at 
time  k. 

Definition  5.2.  A  policy  n1  for  M1  is  a  sequence  Kf  —  {ti{v  ti\  ,  { )  of  universally  measur¬ 

able  stochastic  kernels  Jt'k  :  AZ{Ca)  x  ^(5)  x  4  £  [0, 1],  which  assigns  to  each  initial  distribution 


123 


Po  E  £P(S)  and  information  vector  ik  E  Ik  a  probability  measure  n'k(da\pQ\ik)  on  the  Borel  space 
(Ca,&(Ca)).  The  set  of  such  policies  is  denoted  by  II'. 

If  for  each  k,  initial  distribution  po  E  &(S)  and  information  vector  ik  E  4  the  stochastic  kernel 
%'k  assigns  probability  mass  one  to  some  point  in  Ca,  the  policy  n'  is  said  to  be  non-randomized. 
The  class  of  non-randomized  policies  for  is  denoted  as  n.  For  any  %'  E  Id,  we  can  iden¬ 
tify  the  stochastic  kernels  nL  k  =  0,1, ....  —  1  with  a  sequence  of  universally  measurable  maps 

Kk  :  £P(S)  x  4  — ^  Ca,  k  —  0,1, ....  —  1  (see  for  example  Bertsekas  and  Shreve,  1978,  Corollary 

7.44.3).  As  shown  in  Bertsekas  and  Shreve  (1978),  for  an  additive  cost  imperfect  state  information 
stochastic  optimal  control  problem,  it  is  sufficient  to  consider  the  class  of  non-randomized  policies 
n  over  the  set  of  general  policies  II'.  It  turns  out  that  a  similar  result  also  holds  for  the  probabilistic 
safety  problem,  which  has  a  multiplicative  cost  structure. 

Using  a  similar  procedure  as  described  in  section  4.2  for  a  DTSHG,  one  can  construct  from  vx, 
Vq,  and  vy  a  Borel-measurable  stochastic  kernel  v  :  &(S)  x  S  x  Ca  —>  [0, 1]  describing  the  hybrid 
state  evolution  at  each  time  step.  With  these  definitions,  the  execution  of  the  POdtSHS  under  a 
given  initial  distribution  p q  E  &(S)  and  policy  Jl'  E  II'  is  as  described  in  Algorithm  5.2.1. 


Algorithm  5.2.1  POdtSHS  Execution 

Require:  Initial  distribution  po  E  &{S)  and  control  policy  n'  E  IT'. 
Extract  from  S  a  value  ,vo  according  to  po\ 

Extract  from  Z  a  value  zq  according  to  Co(* l-^o) ; 

Set  5(0)  =  so  and  i'o  =  yo; 
for  k  =  0  to  N  —  1  do 

Extract  from  Ca  a  value  ak  for  a(k)  according  to  Kfk(-\po'Jk)l 
Extract  from  S  a  value  sk+ 1  for  s(k+  1)  according  to  v(-|^,ajt); 
Extract  from  Z  a  value  Zk+ 1  for  z(k  +  1)  according  to  .ak); 

Set  4+i  =  (ik,a(k),z(k+  1)); 

end  for 

return  Sample  Path  {(s0,z0,«o,  •••,  W-i, Zn-uciN-i ,sn,zn)}- 


Now  consider  the  sample  space  of  state,  observation,  and  control  sequences  over  k  time  steps 
given  by  £lk  Sk+1  x  Zk  f  1  x  Ck,  equipped  with  the  canonical  product  topology 
dS{Q.k)  :=  riy=  !  (&(S)  x  ^(z))  x  UU  Then  by  Proposition  7.45  of  Bertsekas  and  Shreve 

(1978),  for  a  given  initial  distribution  po  E  £P(S)  and  policy  n'  E  Id',  the  stochastic  kernels  V,  (+, 
and  C,  induce  a  unique  probability  measure  Pk(K,p)  on  Q.k.  In  particular,  on  measurable  rectangles 
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which  generates  the  Borel  c-algebra  I\(7l.  po)  is  defined  as 


k- 1 

x  x  Ca^ xSkX  Zk ) 

7=0 


x  1 1/»; /jk_i)4T(^Zfc— i Is*— i, a*— 2) v(rfjjfc_i |sjfc_2, 2) 

x  •  •  •  7il0{da0\p-,zo)^o{dzo\so)po{dso), 


(5.1) 


where  So,..., G  3§{S),  Zo,...,Z^  G  «S^(Z),  and  Ca;o, ...,Ca^-i  €  «^(Ca)  are  Borel  subsets  of  the 
state  space,  observation  space,  and  control  input  space,  respectively.  In  the  following,  we  describe 
how  this  probability  measure  allows  us  to  quantify  the  probability  of  safety  for  a  POdtSHS. 


5.2.2  Partial  Information  Safety  Problem 

Now  consider  the  probabilistic  safety  problem.  Assume  that  a  Borel  set  W  G  38 {S)  is  given  as  a 
safe  set.  The  probability  that  the  hybrid  state  trajectory  {sQ,si,...,sp)  remains  in  VP  for  an  initial 
distribution  po  G  PP(S)  and  n'  G  TI'  is  given  by 

p71' {p(hW)  :=PN(n,,po)({(so,zo,ao,...,sN-i,ZN-i,aN-1,sN,ZN)  :skeW,  Wk  G  [0 ,1V]}) 

=  Pn(k',pq){Wn+1  x  ZN+l  x  C%).  (5.2) 


By  (5.1)  and  Proposition  7.45  of  Bertsekas  and  Shreve  (1978),  the  safety  probability  in  (5.2) 
can  be  rewritten  as 


pK\po;W)  =  f 
Jw 


tv  JZ  JCa 


IWJZJCa  JW  JZ 


C  ( dzN  |  SN ,  CIN- 1 )  V  ( dsN  I  sN- 1 ,  aN- 


h 


x  n^^da^- 1  \p;iN-i)^(dzN-i  l^/v-i  ,0^7-2)  v^sTv-i  |^tv— 2, 2) 
x  •  •  •  ^o(^o|p;zo)Co(^zol^o)po(^o), 


N 


la 


]"[  kw(sk)dPN(n',po)  =  Ep0 


■N  k= 0 


N 

n  It v(Sk) 

k= 0 


(5.3) 


where  denotes  the  expectation  with  respect  to  the  probability  measure  P^in' .  po)  on  the  sample 
space  Qn. 

Our  control  objective  is  to  maximize  this  probability  over  the  general  policy  space  IT'.  More 
precisely,  the  problem  statement  is  as  follows: 


Problem  5.1.  Given  a  POdtSHS  Jtf,  initial  distribution  po  G  kP(S),  and  safe  set  W  G  &(S): 


1 .  Compute  the  maximal  probability  of  safety 

p*(po;W):=  sup  pn\p0;W); 

n'eW 
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2.  Find  an  optimal  policy  K*  G  FI',  if  it  exists,  such  that  p*(po',W)  —  p71  (po;W).  Otherwise, 
for  a  choice  of  £  >  0,  find  an  £-optimal  policy  K*  G  FI'  satisfying 


p^(pQ;W)  >  p*(p0-W)-e. 

5.3  Sufficient  Statistics  and  Equivalent  Perfect  State 
Information  Problem 

It  is  well-known  in  the  stochastic  optimal  control  literature  that  under  an  additive  cost  structure, 
the  imperfect  state  information  problem  can  be  converted  into  one  of  perfect  state  information 
through  the  notion  of  sufficient  statistics,  which  is,  roughly  speaking,  an  estimator  which  provides 
enough  information  to  allow  optimal  control  selection  with  respect  to  the  history  of  observations 
and  controls  (see  for  example  Bertsekas  and  Shreve,  1978,  chapter  10).  Under  mild  assumptions, 
the  sufficient  statistic  for  an  additive  cost  problem  can  be  shown  to  be  the  conditional  probability 
distribution  of  the  system  state  given  the  information  vector.  However,  due  to  the  multiplicative 
cost  structure  of  the  probabilistic  safety  problem,  it  is  no  longer  sufficient  to  maintain  a  conditional 
distribution  of  the  current  state,  but  also  some  information  about  the  history  of  state  evolution.  As 
will  be  shown  in  this  section,  a  sufficient  statistic  for  our  problem  consists  of  a  filtered  estimate 
of  the  current  state,  along  with  an  augmented  state  variable  which  keeps  track  of  whether  the  state 
history  has  remained  within  the  safe  set  W . 

We  will  proceed  in  several  steps.  First,  the  POdtSHS  will  be  augmented  with  an  auxiliary 
state,  so  as  to  enable  an  equivalent  terminal  cost  formulation  of  Problem  5.1.  Second,  using  this 
terminal  cost  formulation,  a  statistic  sufficient  for  control  will  be  constructed  from  the  results  given 
in  Bertsekas  and  Shreve  (1978).  Finally,  an  equivalent  perfect  state  information  problem  will  be 
formulated  through  the  sufficient  statistic. 

5.3.1  Terminal  Cost  Problem  on  Augmented  System 

As  a  first  step,  we  augment  the  POdtSHS  model  of  the  previous  section  with  the  binary  random 
variables  hk  :  £lk  — *  (0, 1},  k  —  0, 1, ..., N,  defined  as: 


k- 1 

h0:=l-,  hk:=Y[lw(sj),  k>l-  (5-4) 

j= 0 

For  the  rest  of  this  chapter,  we  will  refer  to  hk  as  the  history  state.  Now  consider  an  augmented 
POdtSHS  model  with  the  expanded  state  space  S  —  (0, 1}  x  S,  in  which  the  state  of  the  system  at 
any  time  k  is  given  by  the  pair  (hk,  sk).  From  (5.4),  the  history  state  can  be  recursively  updated  as 

hk+ 1  =  lw(sk)hfo  ho  =  1, 
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which  results  in  an  augmented  state  transition  kernel  V  :  33  (S)  xSxCa — >■  [0, 1]  defined  as  follows: 


f  v(</sjt+i|,Sjfc,a*), 


v((hk+i,dsk+i)\(hk,sk),ak)  =  < 


0, 

ls\w(sk)v(dsk+i\sk,ak), 


{lw(sk)y(dsk+i\sk,ak), 


hk  =  0,hk+i  =  0 
hk  =  0,hk+\  =  1 
h  =  1  ■  hk+ 1  =  0 
h  =  1,%+t  =  1- 


(5.5) 


Similarly,  we  can  define  the  observation  kernels  Co  :  «^(Z)  x  S  — >•  [0, 1]  and  Z  :  3S{Z)  x  S  x  Ca  — >■ 
[0, 1]  on  the  extended  state  space: 

Co  {dzk\hk,sk)  =  £o(dzk\sk),  (5.6) 

l(dzk\hk,sk,ak-i)  =  C(dzk\sk,ak-i).  (5.7) 

Clearly,  v,  Co,  and  C  are  Borel-measurable.  We  denote  the  augmented  POdtSHS  model  by  M’  := 

(S,Ca,Z,v,Co,0- 

Now  consider  a  Borel-measurable  function  C  :  £P(S)  — >  dZ(S)  which  takes  an  initial  state  dis¬ 
tribution  on  S  to  an  initial  state  distribution  on  S: 


C( Po)(ho,ds0 ) 


0,  ho  =  0 

Po{ds0),  ho=  1. 


(5.8) 


Clearly,  C  is  one-to-one.  As  such,  by  Kuratowski’s  theorem  (see  for  example  Bertsekas  and  Shreve, 
1978,  Proposition  7.15),  &(S)  and  C  £Z{S)  are  isomorphic  Borel  spaces,  with  the  Borel 

isomorphism  C . 

We  define  the  set  of  admissible  control  policies  for  an  augmented  POdtSHS  model  Jf?  as 
follows. 

Definition  5.3.  A  policy  ft'  for  is  a  sequence  ft'  =  (k'{),  n\.  ...,A'n_  j)  of  universally  measurable 
stochastic  kernels  nk  :  dS{Ca)  x  L,  {dZ{Sj)  x  lk  — »  [0, 1],  which  assigns  to  each  initial  distribution 
C  (po)  and  information  vector  ik  =  (zq-Uo-  ■■■■Zk- \  ,ak_  \,z.k)  aprobability  measure  fck(dak\t;(po)-,ik) 
on  the  Borel  space  (Ca, &{Ca)).  The  set  of  such  policies  is  denoted  by  ft'. 

Similarly  as  before,  we  denote  the  class  of  non-randomized  policies  for  jfe  as  ft.  Given  that 
C  is  a  Borel  isomorphism,  ft'  and  n'  can  be  viewed  as  identical  policy  spaces.  In  particular,  a 
policy  ft'  =  (tTq can  be  identified  with  a  policy  n'  =  (k'q E  n'  through 
the  relation 


7tk{dcik\^(p0);ik)  =  n'k(dak\Podk)-  (5-9) 

For  a  given  initial  distribution  £(po)  E  g(-{Z(S))  and  policy  ft’  e  ft',  the  stochastic  kernels  V, 
Co,  and  C  induce  a  unique  probability  measure  C  (po))  on  the  sample  space  Clk  Sk+l  x 
Zk+l  x  Ck  defined  similarly  as  in  (4.1).  Using  this  probability  measure,  we  consider  a  probabilistic 
reachability  problem  on  the  augmented  system.  Specifically,  let  W  E  &(S)  be  a  safe  set.  Then 
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given  an  initial  distribution  £,  ipo)  G  E,  (dP(S))  and  policy  ft'  G  ft',  the  probability  that  the  trajectory 
(so, sj,  .....s~y)  terminates  in  a  state  such  that  /z,v  =  1  and  ,v,v  G  VF  is  given  by 

PK' x  VP)  :=  PN(7t',Z(po))({(ho,s0,zo,ao,-~,hN,sN,ZN) :  %  =  1  A%  G  W}) 

=  W,  €  (po))({0,  lfx{l}xSwxWxZwxCf) 

=  ^0)  [l{1}xtv(%)]  •  (5-10) 

Now  consider  a  probabilistic  reachability  problem  for  MJ  defined  as  follows. 

Problem  5.2.  Given  an  initial  distribution  po  G  ^*(5)  and  an  augmented  POdtSHS  ^  defined 
with  respect  to  a  Borel  safe  set  VP  G  38  (S): 

1 .  Compute  the  maximal  reachability  probability 

P*(^(po);{1}  X  w)  :=  sup  ^'(^(po);!!}  X  VP); 

rc'en' 

2.  Find  an  optimal  policy  ft*  G  fl',  if  it  exists,  such  that  p*(^  (po);  {1}  x  VP)  =  p^*  (f  (po);  {  1 }  x 
VP).  Otherwise,  for  a  choice  of  £  >  0,  find  an  £-optimal  policy  7T*  G  fl'  satisfying 

^(«W;{l}xW)>p*(5(po);{l}xf)-e. 

By  the  form  of  the  cost  function  in  (5.10),  the  above  problem  can  be  viewed  as  a  terminal 
cost  problem  for  3Z .  In  the  following,  we  will  establish  the  equivalence  between  Problem  5.1  and 
Problem  5.2. 

Proposition  5.1.  Let  —  (Q,n,Ca,Z,  vx,  Vq,  vr,  £o,  Q  be  a  POdtSHS,  and  VP  G  38(S)  be  a  Borel 
safe  set.  Let  M’  =  (S.  Ca.  Z.  v,  £o,  Q  be  the  corresponding  augmented  POdtSHS.  Then  for  every 
Po  G  3^(S),  we  have 

P*(po;W)  =  p*{^{p0);{l}  xW). 

Proof.  For  every  po  G  3Z{S\  a  policy  n’  G  FI'  for  MJ  is  equivalent  to  a  policy  ft’  G  ft'  for  M'  by 
(5.9).  Through  this  equivalence  (as  induced  by  the  Borel  isomorphism  <§),  it  is  sufficient  to  prove 
that,  for  every  id  G  FI',  the  following  equality  holds: 

P71' (po’,W)  —  p7t\t,(p)\{\}  x  VP). 
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Indeed,  by  the  previous  definitions, 


p71' x  W)  =  /  l{i}xW(iriv)C(*A'l^,«yv-i)v(^|%-i,ayv-i) 

J  Q.n 

x  n'N_l(daN-i\po',iN-i)^(dzN-i\sN-i,^N-2)y(dsN-i\sN-2,aN-2) 
x  ■  ■  ■  KQ(dao\po;zo)£o{dzo\so)E,{po){ds0) 

JSlfxSxZft+lx(% 

x  k'n_  !  ( dau- 1  |po;  /tv-  t )  C  (<^A- 1 1 %- 1 ,  atv-2) 

X  1{  1 }  x  W  (%- 1 )  V  (^TV- 1  \sN-2-,  ClN-2)^N-2  ( ddu-2  \ PO fiV-2 ) 

X  •••^o(^«o|po;zo)Co(^oko)^(po)(^o) 


r  N 

=  I  1"T  ln/(5'/t)C(^ZA'l's'A'?aA,-l)V(^5A'l5A?-l,«A'-l) 

JSxSNxZN+1xC% 

x  7T^_  J  (da.]\[—  1 1  PO ;  /tv-  1 )  C  (<^A- 1  |^/V—  I ,  «V-2)  v  (rf^TV- 1 1  ^TV-2 ,  <Tv-2 ) 
X  •••  X  7To (da0 1 PO ; £(} ) Co ( dz.o 1 50 )  1  {I } x w ( A) )£,(po)(dso) 

,  N 

=  Y\lw(sk)dpN^^)=p«\p^W). 

Jq-N  k= 0 


This  completes  the  proof.  □ 

5.3.2  Construction  of  a  Sufficient  Statistic 

By  Proposition  5.1,  the  partial  information  safety  problem  for  the  original  hybrid  system  J(?  is 
equivalent  to  a  terminal  reachability  problem  for  the  augmented  hybrid  system  J$f.  This  allows  us 
to  use  results  from  Bertsekas  and  Shreve  (1978)  to  construct  a  statistic  sufficient  for  control  of  the 
augmented  system. 

In  the  following,  we  adapt  the  definition  given  in  Bertsekas  and  Shreve  (1978)  of  a  sufficient 
statistic  for  general  additive  cost  stochastic  optimal  control  problems  to  the  terminal  cost  problem 
for  J&. 

Definition  5.4.  A  statistic  for  3$  is  a  sequence  (170,  fii,  ...t/tv-i)  of  Borel-measurable  functions  r\k  : 
£>{&{$))  x  4  —s >  Bk,  where  Bq,...,Bn_i  are  nonempty  Borel  spaces.  A  statistic  (170, 771 ,  ...17^-1) 
for  is  said  to  be  sufficient  for  control  if 

1.  For  every  k  =  0, —  1,  there  exists  a  Borel-measurable  stochastic  kernel  v(dpk ,  1  \pk.a) 
on  Bk+i  given  Bk  x  Ca  such  that  for  every  /+>  e  &(S),  ft'  G  ft',  and  Ek+\  e  &{Bk+ 1),  the 
following  identity  holds 

4+1  (#', k (po))  {rik+\ (€ (po); 4+1 )  £Ek+i\rik{Z(Po)dk)  =  P-,ak  =  a}  =  v(Ek+] \p,a) 
for  Pk{%',  (;  (p))  almost  every  (rj,a). 
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2.  There  exists  a  lower  semianalytic  function  :  BN  — >•  [0, 1]  such  that  for  every  po  G  <^(5) 
and  if'  G  fl',  the  following  identity  holds 

£f(Po)  [l{i}xw(4v)|%(£(po);40  =  *?]  =gN{ri) 
for  (/?o))  almost  every  rj. 

For  the  derivation  of  a  sufficient  statistic,  we  will  use  the  notations  E,  ( po)  G  l %(£?{$)')  and  p  G 
&(S)  to  distinguish  between  an  initial  distribution  in  the  range  of  E,  and  a  probability  distribution 
on  S  produced  using  recursive  update  equations.  In  particular,  p  may  not  belong  to  the  range  of  E, . 

As  a  first  step,  by  Lemma  10.3  of  Bertsekas  and  Shreve  (1978),  there  exist  Borel-measurable 
stochastic  kernels  <$>o(ds\E,  (p)’,z)  on  S  given  <^(^(S))  x  Z  and  <&(ds\p;z,a )  on  S  given  PZ(S)  x 
Z  xCa  which  satisfy 


[  £o(E2\s)£(po)(ds)  =  [_  f  ^>o(Ei\^(po);z)^o(dz\S)^(p0)(ds)  (5.11) 

Jei  Js  Je2 

f  £(E2\S,a)p(ds)  —  [  j  <&(Ei\p;z,a)£(dz\s,a)p(ds)  (5.12) 

JEi  Js  Je2 

for  every  Borel  set  E\  G  &(S),  £3  G  PS{Z),  probability  distribution  E,  (po)  G  E,  (PP(S)),  p  G  £Z(S), 
and  control  action  a  G  Ca. 

Now  consider  a  function  T* :  pZ(S)  x  Ca  —>  &(S),  corresponding  to  the  prediction  step  of  a 
hybrid  state  filter: 

m{p,a)(E)  =  f_v(E\s,a)p(ds),  V£  G  B8{S).  (5.13) 

Js 

By  Propositions  7.26  and  7.29  of  Bertsekas  and  Shreve  (1978),  the  mapping  VF  is  Borel-measurable. 
For  a  given  information  vector  4  G  4  and  initial  distribution  E,  (po)  G  c,('Z(S)).  define  the  stochas¬ 
tic  kernels  pk  :  qijJZiS))  x  4  — *  SZ{S)  through  the  following  innovation  equations: 

Po(%{po)do)  =®o(dso\Z(po);zo),  (5.14) 

Pk+i  (£  (po)  ;  4+t )  =  ®(dsk+l  lvl/(4/t(^  (po)  ;  4),  ak)  ;zk+i ,  ak) . 


Clearly,  these  stochastic  kernels  are  Borel-measurable.  Furthermore,  by  Lemma  10.4  of  Bertsekas 
and  Shreve  (1978),  i4(^(,Po);4)  can  be  viewed  as  the  conditional  distribution  of  ( /i+ . ,syc )  given  the 
information  vector  4  and  initial  distribution  E,(po)-  Finally,  by  Proposition  10.5  of  Bertsekas  and 
Shreve  (1978),  we  have  that  the  sequence  {pk(£,  (Po)'dk)}'k-(l  >s  a  sufficient  statistic  for  . 

In  particular,  a  transition  kernel  V  for  the  statistic  /4  can  be  defined  as 


v(Ek+i\pk,ak)  =  I  l£(G(pk,ak,Ek+i)\sk+i,ak)v(dsk+i\sk,ak)pk(dsk), 


ISJS 


(5.15) 


where  G{p,a,E)  —  {z  G  Z|T>(-|vP(p,a);z,a)  g£}.  Thus,  the  evolution  of  pk{^(p)’Jk)  can  be  char¬ 
acterized  exclusively  in  terms  of  the  stochastic  kernel  v.  For  the  rest  of  this  paper,  we  will  refer  to 
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pk  as  an  information  state.  The  terminal  cost  with  respect  to  this  information  state  can  be  defined 
as 


gN(pN)'-=ll{nxw(sN)pN(dSN)=  pN(l,dsN).  (5.16) 

Js  Jw 

In  the  following  section,  this  function  will  be  used  to  construct  an  equivalent  perfect  information 
stochastic  optimal  control  problem  on  the  space  of  information  states. 

5.3.3  Reduction  to  Perfect  State  Information  Problem 

A  /V  _ _ _  ~ 

Consider  a  perfect  state  information  model  JfP  in  which  the  state  space  is  given  by  S  :=  SryS), 
the  action  space  is  given  by  Ca,  and  the  state  transition  kernel  is  given  by  v.  Define  the  set  of 
admissible  control  policies  for  M’  as  follows. 

Definition  5.5.  A  policy  ft'  for  M1  is  a  sequence  ft  =  {k'{).  ft'N  _ , )  of  universally  measurable 
stochastic  kernels  ft'k  :  PS{Ca)  x  Sk+l  x  Ck  — >  [0, 1],  assigning  to  each  sequence  of  controls  and  in¬ 
formation  states  (po,ao,...,pk-i,ak-hpk)  a  probability  measure  7t'k(dak\po,ao,...,pk-i,ak-i,Pk) 
on  the  Borel  space  (Ca,  3§{Ca)).  The  set  of  such  policies  is  denoted  by  LF. 

If  for  each  k,  the  stochastic  kernel  ft'k  depends  on  the  history  only  through  the  current  in¬ 
formation  state  pk,  then  the  policy  ft'  is  said  to  be  Markov.  If  for  each  k  and  history  vector 
(po,ao,  ...,pk- \,ak-\,pk)*  the  stochastic  kernel  ft'k  assigns  probability  mass  one  to  some  point  in 
Ca,  the  policy  ft'  is  said  to  be  non-randomized.  The  class  of  non-randomized,  Markov  policies  for 
is  denoted  as  ft.  For  any  ft'  G  ft,  we  can  identify  the  stochastic  kernels  ft'k  with  a  sequence  of 
universally  measurable  maps  ftk  :  S  ->•  Ca  (see  for  example  Bertsekas  and  Shreve,  1978,  Corollary 
7.44.3). 

Let  ft'  G  ft',  then  by  Proposition  7.44  of  Bertsekas  and  Shreve  (1978),  the  sequence  ft'  — 
defined  by 

K(dak\%(po);ik)  =  ft'k(dak\po(E,(p0)-,i0),ao,  ...,ak_upk{E,(po);ik))  (5.17) 

is  a  policy  belonging  to  ff.  Through  this  identification,  we  can  view  ft'  as  a  subset  of  ft',  and 
hence  also  of  II'. 

Now  consider  the  sample  space  of  information  state  and  control  sequences  over  time  hori¬ 
zon  N  given  by  :=  SN+1  x  C„,  equipped  with  the  canonical  product  topology  PS(£Ln)  := 
nEi1  df(S)  x  nii  dd{Ca).  Then  for  a  given  initial  information  state  po  E  S  and  policy  ft'  G  ft', 
the  stochastic  kernels  v  and  ft',,  k  =  0, 1, .., N  induce  a  unique  probability  measure  on  ClN. 

K,  p  0 

Let  po  G  S,  ft'  G  IT,  consider  an  A-stage  cost  function  defined  by 

Jn,#(Po)  L  8n (Pn ) dPp'0 ■  (5.18) 

J  Q.j\f 

The  perfect  state  information  problem  for  is  stated  as  follows. 


131 


Problem  5.3.  Given  a  perfect  state  information  model  5$  defined  with  respect  to  a  Borel  safe  set 

W  E^(S): 

1.  Compute  the  optimal  cost  Jf  :=  supA,e^, 

2.  Find  an  optimal  policy  if*  G  ft',  if  it  exists,  such  that  J^(po)  =  7#,  A*  (po)>  Vpo  G  5.  Otherwise, 
for  a  choice  of  £  >  0,  find  an  £-optimal  policy  ft*  G  ft'  satisfying 

Jn,%(Po)  >  jn(Po)  ~  e,  ypo  G  S. 

Given  the  terminal  cost  structure  in  (5 . 1 8),  we  can  apply  standard  dynamic  programming  results 
for  additive  cost  problems  to  obtain  a  solution  to  Problem  5.3  (see  for  example  Bertsekas  and 
Shreve,  1978,  chapter  8).  In  particular,  the  £-optimal  policies  can  be  found  within  the  class  of 
non-randomized  Markov  policies  n.  Before  the  discussion  of  this  dynamic  programming  solution, 
we  will  first  establish  the  connection  between  Problem  5.1  and  Problem  5.3. 

Proposition  5.2.  Let  JiP  =  (Q.  n,Ca,Z,  Vx,  vq.  v, ,  £o,  C)  l)e  a  POdtSHS  and  W  G  3&(S)  be  a  Borel 
safe  set.  Let  Jr  =  (S,Ca,V)  be  the  corresponding  perfect  state  information  model.  Define  a 
function  tp  :  L?(S)  — >•  JZ(S)  as 

<P(po)(E)=  [  £o({zo|po(€Cpo);zo)  eE}\s0)p0(ds0),  (5.19) 

Js 

for  every  Borel  set  E  G  Jd(S).  Then  we  have 

p*(p0-,W)  =  fjj!j{po)<p(j>o){dpo),  Vpo  e 

Js 

Furthermore,  if  ft'  G  ft'  is  optimal,  or  £ -optimal  for  Problem  5.3,  then  ft'  is  also  optimal,  or  £- 
optimal  for  Problem  5.1. 

Proof.  Let  <p  :  be  defined  as 

<P(£Oo))(£)  =  J§£o({zo\Po(Z(po)-,zo)  eE}\s0)%(po)(ds0), 

for  every  Borel  set  E  G  ifSiS).  Then  it  follows  by  Proposition  10.3  of  Bertsekas  and  Shreve  (1978) 
that 

P*(£(Po);{l}  x  W)  =  fjf{Po)v{%{po))(dpo), 

Js 

for  every  t,  (po)  G  qip-ZiS)).  Furthermore,  if  ft'  G  ft'  is  optimal,  or  £-optimal  for  Problem  5.3,  then 
ft'  is  also  optimal,  or  £-optimal  for  Problem  5.2. 

Thus,  by  Proposition  5.1  and  the  observation  that  (p(po)  =  <p(£(po)),  Vpo  €  <^( S ),  we  have 
the  desired  conclusion.  □ 
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5.4  Solution  to  Partial  Information  Safety  Problem 

As  shown  in  Proposition  5.2,  solving  a  partial  information  safety  problem  defined  on  the  hybrid 
state  space  is  equivalent  to  solving  a  perfect  information  terminal  cost  problem  defined  on  the 
information  state  space.  In  this  section,  we  will  first  focus  on  solving  Problem  5.3.  This  then  in 
turn  provides  a  solution  to  Problem  5.1. 

Specifically,  consider  a  dynamic  programming  operator  ^ safe ,  which  takes  as  its  argument  an 
universally  measurable  function  J  :  S  — >  [0, 1]  and  returns  a  function  tfsafe  U)  :  S  — >  [0, 1]: 

Safe(J)(p )  =  sup  [j(p)v(dp\p,a),  pes.  (5.20) 

aeca  Js 

The  solution  to  Problem  5.3  is  given  as  follows. 

Proposition  5.3.  Let  =  (S.Ca,  v)  be  a  perfect  state  information  model  defined  with  respect  to 
a  Borel  safe  set  W  G  Pd(S).  Then 

1.  Jf  =  ^safe^Su); 

2.  For  every  £  >  0,  there  exists  an  £-optimal  non- randomized  Markov  policy  7r|  €  ft /or  Prob¬ 
lem  5.3.  In  particular, 

Jp  :=  sup  JN#  =  sup 
x'efl1  jtetl 

Proof.  These  statements  are  direct  consequences  of  Propositions  8.2,  8.3,  and  10.1  of  Bertsekas 
and  Shreve  (1978).  □ 

From  this  result,  we  have  that  Jf  can  be  computed  through  recursive  applications  of  the  dy¬ 
namic  programming  operator  L?safe->  initialized  with  the  terminal  cost  g,y,  and  that  the  set  of  non- 

/V  ‘  A 

randomized  Markov  policies  tt  is  optimal  over  the  set  of  general  policies  IT.  Furthermore,  suffi¬ 
cient  conditions  of  optimality  can  be  also  derived  from  the  dynamic  programming  algorithm.  For 
notational  conveniences,  we  define  the  optimal  cost-to-go  functions  J%_^N  :  S  — >  [0, 1]  by 

*  =  0,1 . /V.  (5.21) 

By  Proposition  5.3,  it  follows  that  Jq^n  =  Jf.  Using  this  fact  and  standard  dynamic  programming 
arguments  (see  for  example  Bertsekas  and  Shreve,  1978,  Proposition  8.2),  we  obtain  the  following 
corollary. 

Corollary  5.1. 

1.  If  ft  =  (7Co,7Ti,  ...,AN-i)  eft  satisfies 

7tk{p)  e  arg  sup  (jl+l^N(p')v(dp'\p,a),  (5.22) 

aecaJs 

for  every  p  e  S  and  k  —  0, 1...,  N  —  1,  then  K  is  an  optimal  policy  for  Problem  5.3. 
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2.  For  a  given  £  >0,  let  { £\ }  ^=o  be  any  sequence  of  positive  real  numbers  such  that  E/tlo*  £k  — 
£.  If  ft  —  (kq,  fti, ...,  ftp-i)  G  ft  satisfies 

U+^N(p')ndp'\p,7tk(p))>rk^N(p)-£k,  (5.23) 

J  s 

/v  ^ 

for  every  p  G  S  and  k  =  0, 1...,  N  —  1,  then  K  is  an  E-optimal  policy  for  Problem  5.3. 

Combining  Propositions  5.2  and  5.3,  and  Corollary  5.1,  we  arrive  at  the  main  result  of  this 
chapter. 

Theorem  5.1.  Let  be  a  POdtSHS  and  W  G  3§(S)  be  a  Borel  safe  set.  Let  be  the  correspond¬ 
ing  perfect  state  information  model.  Define  gx  :  S  — >•  [0, 1]  as  in  (5.16)  and  (p  :  L?(S)  — »  5P(S)  as 
in  (5.19).  Then  given  po  G  FP(S),  we  have 

L  P*(po’,W)  =  Js^sNafe(SN)(po)(p{po){dp0); 

2.  For  every  e  >  0,  there  exists  an  E-optimal  non-randomized  policy  n*  G  Ylfor  Problem  5.1  of 
the  form 

K,e (po;  4)  =  4,e (Pk(€  (po) ;  4) ) ,  k  =  0, 1 , . . . ,  N  - 1 . 

In  particular, 

P*(po;W)  :=  sup  pn\p0;W)  =  sup/97r(po; W). 

3.  If  ft  —  (fto,fti,...,ftN-i)  G  ft  satisfies  (5.22),  then  ft  is  an  optimal  policy  for  Problem  5.1. 
For  a  given  £  >  0,  if  ft  =  (fto,ft\,...,  7f,\/_  i )  G  ft  satisfies  (5.23),  then  ft  is  an  £ -optimal  policy 
for  Problem  5.1. 

By  this  result,  the  optimal  safety  probability  p  '  ipw  W)  for  the  POdtSHS  can  be  computed 
through  a  terminal  cost  dynamic  programming  algorithm  on  the  information  state  space  S.  Fur¬ 
thermore,  the  £-optimal  policies  can  be  found  within  the  class  of  non-randomized  policies  which 
depends  on  the  initial  distribution  po  and  observation  history  4  only  through  the  augmented  in¬ 
formation  state  pk(f  (po);4)-  This  decouples  the  partial  information  safety  problem  into  two  sub¬ 
problems: 

1.  Computing  the  £-optimal  control  policy  ft*  via  the  dynamic  programming  recursion  given  in 
Proposition  5.3; 

2.  Computing  the  conditional  state  distribution  pk(f  (po);  4)  through  the  innovation  equations 
(5.14). 

The  first  subproblem,  which  is  the  control  aspect  of  the  problem,  can  be  performed  in  an  offline 
setting,  while  the  second  subproblem,  which  is  the  estimation  aspect  of  the  problem,  has  to  be 
performed  in  an  online  setting.  In  both  of  these  problems,  the  difficulty  of  finding  computationally 
tractable  solutions  is  to  a  large  extent  associated  with  the  representation  of  the  information  state  /4, 


134 


as  it  determines  the  size  of  the  space  in  which  the  filtering  and  dynamic  programming  algorithms 
need  to  take  place.  Given  that  pk  is  a  probability  distribution,  which  is  infinite  dimensional,  rather 
than  a  hybrid  state  Sk,  which  is  finite  dimensional,  it  can  be  seen  that  the  partial  information  safety 
problem  is  in  general  significantly  more  difficult  than  its  perfect  information  counterpart. 

It  is  observed  in  Kumar  and  Varaiya  (1986)  that  the  abrupt  jump  in  complexity  when  one 
moves  from  a  perfect  information  model  to  a  partial  information  model  is  reflective  of  the  dual 
role  of  control  in  a  partial  information  optimal  control  problem.  Namely  the  choice  of  control  af¬ 
fects  not  only  the  evolution  of  the  actual  state  through  the  system  dynamics,  but  also  the  sequence 
of  observations  generated  by  the  state  trajectory,  and  hence  the  availability  of  information.  If  more 
informative  measurements  or  observations  are  made,  then  the  uncertainty  in  state  estimates  would 
likely  reduce,  leading  to  better  choices  of  control  inputs.  However,  the  payoff  gained  by  having 
better  estimates  needs  to  be  balanced  with  the  payoff  lost  in  the  process  of  obtaining  better  esti¬ 
mates.  The  use  of  an  information  state  as  a  characterization  of  the  estimation  uncertainty  allows 
the  control  to  quantify  the  expected  costs  and  benefits  of  a  reduction  in  uncertainty.  This  improve¬ 
ment  in  control  quality  unfortunately  comes  at  the  expense  of  reasoning  on  the  space  of  uncertainty 
representations,  which  may  be  much  larger  than  the  underlying  hybrid  state  space. 

For  cases  in  which  it  is  possible  to  find  an  £-optimal  control  policy  7r|  for  Problem  5.3,  a  control 
algorithm  for  the  POdtSHS  can  be  implemented,  at  least  in  principle,  according  to  Algorithm  5.4.1. 
A  block  diagram  illustration  of  this  algorithm  is  shown  in  Figure  5.2. 

Algorithm  5.4.1  POdtSHS  Control  Algorithm 
Require:  Initial  distribution  po  G  PP(S)  and  policy  ft*  G  ft. 
for  k  —  0  to  N  —  1  do 

Obtain  a  measurement  Zk\ 

Compute  information  state  pk(%{po)'Jk)  using  (5.14); 

Apply  control  input  ak  =  ftk,e{Pk(Z  (po);4)); 

end  for 


Figure  5.2:  Block  diagram  of  POdtSHS  control  algorithm. 
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5.5  Extension  to  Probabilistic  Reach-avoid  Problem 


In  this  section,  we  will  discuss  how  the  analysis  of  the  preceding  sections  can  be  extended  to 
address  the  reach-avoid  problem.  In  particular,  due  to  the  sum-multiplicative  cost  structure,  the 
reach-avoid  problem  is  equivalent  to  an  additive  cost  stochastic  optimal  control  problem. 

More  specifically,  suppose  that  R  G  33  (S)  is  given  as  the  target  set  and  W'  G  33(S)  as  the  safe 
set,  with  R  C  XV' .  Then  the  probability  that  the  state  trajectory  (,vo,.V| . .....v,v)  of  a  POdtSHS  M' 

reaches  R  while  staying  inside  W'  for  an  initial  distribution  po  G  0P(W')  and  n'  G  Id'  is  given  by 

rK\pQ-R,W') 

:=  Pn(k' ,p0)({(s0,zo,a0,  ...,sNizn)  :  3 ke  [0,1V],  (sk  G  R)  A  (sj  G  W' ,  Vj  G  [0,k])}) 

(  N 

=  PN{n',pQ )  [J  (W'\R)k  xRx  SN~k  x  ZN+l  x 

\k= 0 

=  ^  Pjsr(n' ,po)((W'\R)k  xRx  SN~k  x  ZN+l  xCf),  (5.24) 

k= o 


where  the  final  equality  follows  by  the  fact  that  the  union  is  disjoint.  From  (5.1),  this  probability 
can  be  computed  as 


S(po  -,R,W')=E* 


'  N  (k- 1 
k= 0  \j=0 


1  R{Sk) 


(5.25) 


where  E^{)  denotes  the  expectation  with  respect  to  the  probability  measure  Pn{k' .  po)  on  the  sample 
space  Q-m-  As  before,  our  control  objective  is  to  maximize  this  probability  over  the  general  policy 
space  IT.  More  precisely,  the  partial  information  reach-avoid  problem  for  a  POdtSHS  is  as  follows: 


Problem  5.4.  Given  a  POdtSHS  3%* ,  initial  distribution  po  G  S?(S\  target  set  R  G  &(S),  and  safe 
set  W'  G  &{S)  such  that  R  C  W'\ 


1 .  Compute  the  maximal  reach-avoid  probability 

r*(po',R,  W1)  :=  sup  r71' (p0-R,W'); 


2.  Find  an  optimal  policy  n *  G  n;,  if  it  exists,  such  that  r*(pQ\R,W')  =  rK*  (po\R,W').  Other¬ 
wise,  for  a  choice  of  e  >  0,  find  an  £-optimal  policy  K*  G  n;  satisfying 

rnz(pQ-R,W')>r\p0-R,W')-£. 


We  define  a  modified  history  state  for  the  reach-avoid  problem  as 

k- 1 

^0  =  1;  hk  =  1  w>\R(sj),  k>  1. 

1=0 


(5.26) 
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The  corresponding  augmented  POdtSHS  model  3$,  whose  state  at  each  time  step  k  is  given  by 
{hk,Sk)i  can  be  defined  similarly  as  in  Section  5.3.1. 

For  a  given  initial  distribution  %(po)  G  c,(fZ(S))  and  policy  ft'  G  fl'  for  consider  the  fol¬ 
lowing  additive  cost  function 

r  n 

f* (i(p0y,R,W')  ■.=  E%  }  £l„)x*(?y)  ,  (5.27) 

U=o 

where  denotes  the  expectation  with  respect  to  the  probability  measure  Pn(k'  .  q  (po))  on  the 

sample  space  :=  S^1  x  ZN+1  x  C%.  In  the  following,  we  use  (E,(po);R,W')  to  define  an 
additive  cost  optimal  control  problem  for  the  augmented  POdtSHS. 

Problem  5.5.  Given  an  initial  distribution  po  G  £P(S)  and  an  augmented  POdtSHS  MJ  defined 
with  respect  to  a  target  set  R  G  38  (S)  and  a  safe  set  W'  G  38  (S)  such  that  R  C  W'\ 

1 .  Compute  the  optimal  cost 

r^(PQ)-R,w')-.=  sup  f*(Z(P0y,R,w'y, 

ft’efi ' 

2.  Find  an  optimal  policy  ft*  G  ft',  if  it  exists,  such  that  r*(E,(po)-,R,W')  —  (%(po);R,W'). 

Otherwise,  for  a  choice  of  £  >  0,  find  an  £-optimal  policy  ft*  G  fl'  satisfying 

r*i(Z(p0y,R, W')  >  f*(^ (po y,R, if')  - e. 

We  now  proceed  to  establish  the  equivalence  between  Problem  5.4  and  Problem  5.5. 
Proposition  5.4.  For  every  po  G  38  (S)  and  R,W'  G  38 (S)  such  that  R  C  W\  we  have 

r*(po-,R,W')  =  r*^(p0y,R,W'). 

Proof.  By  the  equivalence  of  the  policy  spaces  n'  and  fl'.  it  is  again  sufficient  to  prove  that,  for 
every  Jtf  G  n'.  the  following  equality  holds: 

rK\Po\R^W')  —  rK'  (£,  (po);R,W'). 

By  the  previous  definitions, 

P R.W)  =  t  £  l{1)xS(4)rf/V(V,^(p0)) 

JO.Nk=0 
n  r 

=  ^  /_  l{i}x/?(4)^(7T^(po)) 

k=0Jak 
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Using  a  similar  line  of  argument  as  in  the  proof  of  Proposition  5.1,  it  can  be  shown  that  for 
every  k  —  0,  1,...,A, 


'k- 1 


JCii 


1{1  }xR{Sk)dPk{Tl  ,^(Po))  L  n  h v'\r(sj)  U{sk)dPk(n',Po). 


IQ, 


*  \J= o 


Thus,  we  have 


N  ,  (k- 1  \ 

*\i;(poy,R:W')  =  £  /  nM,(u) 


k=0J£1k  \j= o 


tV  (k- 1 

L  ni^'\te(u)1«(5fc)  Uflv(tr/,p0) 


lQNk=()  \j=  0 


=£7r 


L*=0  \7=0 


=  rK\po-,R,W') 


The  desired  conclusion  then  follows. 


□ 


Applying  the  set  of  procedures  given  in  Section  5.3.2,  we  can  derive  an  information  state 
/A (<3 (Po);4)>  k  =  0,1,..., A  for  the  augmented  POdtSHS  model  By  Proposition  10.5  of 
Bertsekas  and  Shreve  (1978),  this  then  becomes  a  sufficient  statistic  for  Problem  5.5.  Let  = 
(S,Ca,  v)  be  the  corresponding  perfect  state  information  model.  Consider  a  Borel-measurable  one- 
stage  cost  g  :  S  — >  [0, 1]  defined  by 


l{1}xR(s)p(ds)  =  J  p(l,ds), 


Let  po  G  S,  ft'  G  ft',  define  an  A-stage  cost  function  as 


Jn,jP  (Po) 


(5.28) 


(5.29) 


The  perfect  state  information  problem  is  stated  as  follows. 

Problem  5.6.  Given  a  perfect  state  information  model  defined  with  respect  to  a  target  set 
R  G  38  (S)  and  a  safe  set  W'  G  38 (S)  such  that  R  C  W': 

1.  Compute  the  optimal  cost  sup 

2.  Find  an  optimal  policy  ft *  G  fl',  if  it  exists,  such  that  J^(po)  —  Jn.a*  (po),  V po  G  S.  Otherwise, 
for  a  choice  of  £  >  0,  find  an  £-optimal  policy  ft*  G  ft'  satisfying 


Jnm(Po)  >  Jn(Po)  ~  £,  Vp0  e  S. 
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By  an  almost  identical  argument  as  in  the  proof  of  Proposition  5.2,  we  have  the  following  result 
establishing  the  connection  between  Problem  5.4  and  Problem  5.6. 

Proposition  5.5.  Let  —  ( Q,n,Ca,Z ,  vx,  Vq,  vr,  £o,  C)  b e  a  POdtSHS  and  R,  W'  G  38 (S)  be  Borel 
sets  such  that  R  C  W' .  Let  MJ  =  (S.Ca,  v)  be  the  corresponding  perfect  state  information  model 
for  the  reach-avoid  problem.  Define  a  function  (p  :  3?(S)  — >  £P(S)  as  in  (5.19).  Then  we  have 

r*(po',R,W’)  =  fjN(Po)(P(Po)(dpo),  Wp0  G  &(S). 

Js 

Furthermore,  if  ft'  G  ft'  is  optimal,  or  e -optimal  for  Problem  5.6,  then  ft'  is  also  optimal,  or  £- 
optimal  for  Problem  5.4. 

Using  standard  dynamic  programming  results  for  additive  cost  problems,  we  can  also  derive  a 
solution  to  Problem  5.6,  which  in  turn  provides  a  solution  to  Problem  5.4.  Specifically,  consider  a 
dynamic  programming  operator  as  defined  by 

&ra{J){p)  =  sup  g(p)  +  [j(p)v(dp\p,a):  peS  (5.30) 

a(zCa  IS 

for  universally  measurable  functions  J  :  S  — >  [0, 1]. 

Then  by  propositions  8.2  and  8.3  of  Bertsekas  and  Shreve  (1978),  we  have  the  following  dy¬ 
namic  programming  result. 

Proposition  5.6.  Let  MJ  =  (S.Ca,  v)  be  a  perfect  state  information  model  defined  with  respect  to 
a  target  set  R  G  38 (S)  and  a  safe  set  W'  G  38 (S)  such  that  R  C  W' .  Then 

I-  L  = 

2.  For  every  £  >  0,  there  exists  an  £-optimal  non- randomized  Markov  policy  fc*  G  tlfor  Prob¬ 
lem  5.6.  In  particular, 

Jf  :=  sup  JN)ff  =  sup  JN)jt. 
ffefl1  Ae  fl 

Combining  propositions  5.5  and  5.6,  a  solution  to  Problem  5.4  can  be  now  stated. 

Theorem  5.2.  Let  be  a  POdtSHS  and  R,W'  G  38 (S)  be  Borel  sets  such  that  R  C  W'.  Let 
MJ  be  the  corresponding  perfect  state  information  model  for  the  reach-avoid  problem.  Define 
g  :  S  — >  [0, 1]  as  in  (5.28)  and  (p  :  3?(S)  — >  3?(S)  as  in  (5.19).  Then  given  po  G  &(S),  we  have 

L  r*(p0;R,W')  =  Js^{g)(Po)(p{Po)(dp0); 

2.  For  every  £  >  0,  there  exists  an  £-optimal  non-randomized  policy  n*E  G  Tlfor  Problem  5.4  of 
the  form 

K,e  (po ;  4)  =  4,e  (Pk ( ^  (po)  ;  4) ) ,  k  =  o,  1 , . . . ,  n  - 1 . 

In  particular, 

r*(p0;R,W')  :=  sup  r71' (p0;R,Wr)  =  sup  rK(po\R,W'). 
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3.  Let  J*^N:=  ^  k(g),  k  =  0,  4-,., IV.  If  ft  =  (^o,  ft1: TT/v-i)  e  ft  satisfies 


7tk{p)  e  arg  sup  j£+l^N{p)v(dp\p,a),  (5.31) 

aecjs 

/or  every  p  e  51  and  k  =  0, l...,IV  —  1,  then  n  is  an  optimal  policy  for  Problem  5.4.  For  a 
given  e  >  0,  let  {sk  be  any  sequence  of  positive  real  numbers  such  that  E^Tq1  £k  =  £.  If 
ft  —  (tEo,  TTj,  TT/v—  l )  G  ft  satisfies 

fjl+i^N{p)v(dp\p:  ftk{p))  >  J~UN{p)  -  £k,  (5.32) 

/or  every  p  E  S  and  k  =  0,  1 ....  —  1,  then  K  is  an  £-optimal  policy  for  Problem  5.4. 

Thus,  the  partial  information  reach-avoid  problem  can  be  solved  using  an  additive  cost  dynamic 
programming  algorithm  on  the  information  state  space  S.  For  the  rest  of  this  chapter,  we  will  focus 
our  attention  on  the  probabilistic  safety  problem,  with  the  understanding  that  any  result  proved 
for  the  safety  problem  can  be  generalized  to  the  reach-avoid  problem  through  the  set  of  minor 
modifications  in  argument  as  described  in  this  section. 

5.6  Sufficiency  of  Non-Randomized  Markov  Policies  for  the 
Perfect  Information  Safety  Problem 

In  this  section,  we  will  state  a  consequence  of  the  preceding  results  for  the  special  case  in  which  the 
hybrid  state  is  perfectly  observed.  Specifically,  consider  a  PODTSHS  model  M’  with  observation 
space  Z  —  S  and  observation  model  £o(dz\s)  —  £ (dz\s,  a)  =  Ss.  For  the  system  it  will  be  shown 
that  the  class  of  non-randomized  Markov  policies,  which  select  input  deterministically  based  upon 
measurement  the  current  state  zk  —  sk,  is  optimal  for  the  probabilistic  safety  problem,  within  the 
general  class  of  randomized  non-Markov  policies.  In  other  words,  for  the  perfect  information  case, 
it  is  unnecessary  to  randomize  one’s  choice  of  controls  or  maintain  memory  of  the  history  of  hybrid 
states  and  controls,  despite  the  multiplicative  cost  structure  of  the  safety  problem.  This  provides 
formal  justification  for  the  restriction  of  attention  to  this  class  of  policies  in  previous  work  by 
Amin  et  al.  (2006),  Abate  et  al.  (2008),  and  Summers  and  Lygeros  (2010)  on  perfect  information 
probabilistic  reachability  problems  for  DTSHS. 

As  consistent  with  a  perfect  state  information  model,  we  assume  that  the  initial  condition  so  G  S 
of  Jff  is  measured.  This  results  in  an  initial  distribution  po  =  Sso.  The  maximal  safety  probability, 
as  a  specialization  of  the  definitions  in  section  5.2.2,  is  then  parameterized  only  by  the  initial 
condition  so,  and  will  be  thus  denoted  simply  as  p*0(W).  It  can  be  verified  that  a  sufficient  statistic 
for  the  perfect  information  safety  problem  is  given  by  the  sequence  of  functions  rjk  :  ^  (t^(S))  x 
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4  — >  S  defined  as 


Vo(Z(p  o);*o)  =  (ftzo), 

k—  I 

Vk(G(po);ik)  =  (Tl^jUk),  k>  1. 

7=0 

By  the  definition  of  the  history  state  hk  and  the  fact  that  in  the  perfect  information  case, 

it  follows  that  r\k{B>  (Po)'dk)  —  (hk,sk),  V/c  >  0.  In  other  words,  the  information  state  for  M’  is 
simply  the  augmented  system  state  4  =  (4-  sf),  with  the  transition  kernel  as  defined  in  (5.5),  and 
the  terminal  cost  gwftw)  —  l{t}xw(^v)-  Thus,  by  a  special  case  of  Theorem  5.1,  we  have 

pl(W)  =  sup pf0(W),  Vs0  G  S,  (5.33) 

ftetl 

where  fl  is  the  class  of  policies  consisting  of  elements  ft  =  (fto,  ft  i, ...,  Km-  i),  such  that  fck:S^  Ca 
is  a  deterministic  function  of  the  augmented  state.  Now  consider  a  subclass  .M  of  such  policies 
which  selects  inputs  independently  of  the  history  state  hk,  namely  4(0,  =  4(  1  -Pk)  =  Pk(sk) 

for  some  function  /4  :  S  — »  Ca.  In  the  following,  we  will  proceed  to  prove  that  it  is  sufficient  to 
restrict  one’s  attention  to  the  policy  class  .Jf. 

Proposition  5.7.  Let  M  be  a  perfect  state  information  model.  Then  given  an  initial  condition 
■sp  G  S  and  a  safe  set  W  G  38  (S),  we  have 

P*s0(W)  =  sup  Ps0(W),  \/s0  e  s. 

Proof.  Given  (5.33),  it  is  sufficient  to  prove  that  the  following  equality  holds: 

sup p*Q(W)  =  sup  PsQ{W),  Vso  g  w. 
netl  \i^JL 

First,  by  virtue  of  .Jf  being  a  subset  of  fl,  it  can  be  inferred  that 

sup  p*0(W)  >  sup  Ps0(W),  Vsq  G  S. 
fte  n 

Now  we  proceed  to  show  the  reverse  inequality.  Fix  any  policy  ft  —  (4,  ft],....  7f,v-  i )  G  ft,  consider 
a  policy  /i  =  (jUo,4t, ...,  jUjv-t)  G  defined  as 

Pk(sk)  =  4(1, sk),  \/skeS,  4  =  0,1,  ...,1V—  1. 
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By  Proposition  5.1,  we  have pf0(W)  =  V()j({  1 }  x  W),  Vso  G  S.  This  implies  that  for  every  initial 

condition  sq  G  S, 


N-\ 


v(dh+ 1 |4, 4(4))5(  i)Jo)  (dso) 

■'S  yt=0 


iSNxS 


1\V  (sn)  V  ( |  Sn-  1 ,  Kn-  1  ( 1 7  SN- 1 )  ) 


N—2 


l{i}xtv(%-i)  FI  v(^4+t|4,4(4))5(i,5o)(^o) 


k= o 


OV— 1 


/_xsiv  ^  n  lw(sk+x)v(dsk+x\sk,Tik{\,sk))  ]  l{1}xly(50)5(liJo)(J50) 


N-\ 


=%Oo)  /  n  lw(sk+i)v{dsk+l\sk,pk{sk))  =  p£0(W). 


>sN  k=0 


In  other  words,  for  every  ft  G  ft,  there  exists  a  choice  of  policy  jl  G  ./M  that  does  at  least  as  well. 
Thus,  pfQ(W)  <  suPjUG^ Ps0(W),  Vso  G  5,  ir  G  A.  The  desired  result  then  follows.  □ 

We  note  briefly  that  in  a  perfect  state  information  model,  the  sequence  of  functions 
( po)'Jk )  —  Zk,  k>  0  is  not  a  sufficient  statistic  in  the  strict  sense  of  Definition  5.4.  However, 
due  to  the  particular  form  of  the  cost  function  in  the  probabilistic  safety  problem,  only  decisions 
made  with  respect  to  safe  trajectories  (i.e.  h ^  =  1,  Wk)  contribute  to  the  final  payoff.  Thus,  the 
controls  for  h ^  —  0  can  be  chosen  as  identical  to  those  for  h^=  1. 


5.7  Specialization  to  Partially  Observable  Markov  Decision 
Processes 

In  order  to  illustrate  the  state  estimation  and  dynamic  programming  procedures  for  discrete  models, 
we  now  consider  a  special  case  of  the  POdtSHS  model  in  which  the  state  space,  control  input  space, 
and  observation  space  are  finite.  Namely,  S  —  Q,  Ca  —  Z,  and  Z—O ,  for  some  finite  sets  Q.  E,  and 
O.  This  is  commonly  referred  to  in  literature  as  Partially  Observable  Markov  Decision  Processes 
(see  for  example  Russell  and  Norvig,  2002;  Thrun  et  al.,  2005). 

Given  a  finite  state  space,  the  state  transition  kernel  v  can  be  summarized  in  terms  of  a  transition 
probability  pq  :  Q  x  E  x  Q  — *  [0, 1]  such  that 

v(Q?\q,a)  =  Y,  PqWlQ’0)*  ^  2,4  e  Q,a  gE. 

q'eQ> 

For  simplicity  of  notation,  we  assume  that  the  observations  are  not  affected  by  the  control  inputs. 
In  this  case,  the  observation  kernels  can  be  summarized  in  terms  of  an  observation  probability 


142 


p0  :  Q  x  O  — ) >  [0, 1]  such  that 


t;0(O'\q)  =  ao'\q,<7)=  £ 

o'eO' 


We  denote  the  POMDP  model  specified  above  as  ^pomdp  =  {Q^,0,pq,pa).  For  a  given  safe 
set  W  C  Q,  the  corresponding  augmented  system  model  is  given  by  ^pomdp  =  (Q,^,0,pq,po), 
where  Q,  pq  and  pa  are  defined  as 


e={o,i}xe 

'  Pq(<lk+l\<lfrGk), 

Pq(.hk+i:qk+l\i.hic,qk)jOk)—  < 

*Q\W  ytfk) PqKtfk+l  I Qki  &k)i 

„  lw(qk)Pq{qk+\  I Qk,  °k) , 

Po{ok\hk,qk)  =  Po{ok\qk ),  h  =  0, 1. 


hk  =  0  ,/tfc+i  =  o 
hk  =  0,%+i  =  1 
hk  =  1,%+1  =  0 
hk  =  1,%+t  =  1, 


(5.34) 


for  every  <%.  <%+i  £  2  and  ok  G  O.  As  discussed  in  section  5.3.1,  for  a  given  initial  state  distribution 
Po  G  &{Q)  and  policy  ft'  G  fl',  the  transition  probability  pq  and  observation  probability  p0  induce 
a  unique  probability  measure  ^(tt' ,%(po))  on  the  sample  space  :=  Qk  1  1  x  x 

First  we  will  show  that  the  state  filtering  equations  in  (5.14)  in  this  case  simplifies  to  a  Bayesian 
update  rule  for  the  augmented  POMDP  ^pomdp-  The  precise  statement  is  as  follows. 

Lemma  5.1.  Let  MPpomdp  he  a  POMDP  and  W  C  Q  be  a  safe  set.  Let  J^pomdp  he  the  corre¬ 
sponding  augmented  POMDP.  Then  for  every  po  G  LP{Q),  ft'  G  fl',  qk  G  Q,  and  k  =  0,1, ...,  we 
have 


Po(^(po);io)(<7o) 

Pk(Z(po)'Jk)(qk) 


Po(oo\qoK(po)(qo) 

Lq0eQpo(oo\m(pomy 

Po(ok\qk)Pk\k-i(Z{po)’ik-i,Gk-i)(qk) 
Yjqk£Qp°{Pk\qk)Pk\k—\{P>  (Po)'fk—li  Gk—l){.q.k) 


where 


Pk\k-l(%(Po)',ik-hGk-l)(qk)  =  £  Pq(qk\qk-UGk-l)Pk-l(£(Po)'Jk-l)(qk-l), 

qk-i^Q 


for  Pk (ft'  .c,  (/?o))  almost  every  ik. 

The  proof  of  this  result  largely  revolves  around  manipulating  the  abstract  definitions  of  sec¬ 
tion  5.3.2,  and  can  be  found  in  appendix  C.  Using  the  transition  probability  pq  and  observation 
probability  p(),  the  filtering  equations  in  the  statement  of  Lemma  5.1  can  be  rewritten  as  follows. 
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1.  Initialization  Step: 


Po(£(po)U  o)(l,4o) 
po(^(po)-,io)(0,qo) 


Po{oo\qo)p{oQ ) 
I^0^QPo(oo\qo)p(qoy 
0; 


(5.35) 


2.  Prediction  Step: 

P/t+t|fc(^(po);4,o-yt)(i,^+t)  =  £  Pq(qk+i\qk,Ok)Pk(Z(po)'Jk)(i,<ik),  (5.36) 

qkew 

Pk+i\k(Z(Po)',ik,Gk)(0,<lk+i)  =  Y  Pqfak+i\<Ik,0k)Pk(£(Po)>ik)(O,qk)  (5.37) 

qk^Q 

+  Y  Pq(<Ik+l\<Ik,0k)Pk(£(Poy,ik)(l,qk) 
qk£Q\w 


3.  Update  Step: 


Pk+l(£(Po)‘Jk+l)(hk+l,<lk+l)  = 

Po(oA:+l|<7/t+l)P/t+l|/t(^(7:,0)i4,gfc)(^-+l,(?fc+l) _  ^ 

L(^+i,ft,+  i)egPo(Ofc+l|<7fc+l)p/t+l|yt(^(po);4,C5-Ar)(^+l,^+l) 

From  the  Bayesian  update  equations  (5.35)— (5.38),  it  can  be  observed  that  estimation  in  the 
case  of  a  partial  information  safety  problem  for  the  POMDP  involves  maintaining  a  discrete  prob¬ 
ability  distribution  over  an  augmented  state  space  Q  containing  twice  the  number  of  discrete  states 
as  the  original  state  space  Q.  In  particular,  if  one  were  to  marginalize  the  augmented  distribution 
pk  over  the  history  state  hk,  one  recovers  the  regular  POMDP  update  equations  for  the  conditional 
distribution  of  qk.  More  specifically,  let  pk\k(;\P(hk)  be  the  conditional  state  distribution  of  qk  over 
Q ,  given  the  initial  distribution  po  and  the  information  vector  ik.  The  update  equations  for  pk\k  can 
be  found  in  for  example  Chapter  17  of  Russell  and  Norvig  (2002)  or  Chapter  15  of  Thrun  et  al. 
(2005).  Then  it  can  be  verified  that 

pk\k(<ik\po',ik)  =  M£(po);4)(i,?*)+M£(po);4)(o,®0»  e  Q- 

Thus,  the  augmented  information  state  pk  provides  slightly  more  information  about  the  history  of 
state  evolution  (i.e.  the  safety  of  past  trajectory)  as  compared  with  the  regular  information  state 
pk\k.  As  we  will  illustrate  through  an  example  later  on,  this  extra  information  can  be  important  in 
a  probabilistic  safety  problem. 

In  order  to  give  a  compact  statement  of  the  dynamic  programming  equations,  consider  a  Borel- 
measurable  mapping  /  :  &(Q)  x  O  xZ  — >•  ^(Q)  defined  as 

f(P,o,o)(h',</)  =T>(/i/,^/|vP(p,cj);o,C7),  (h',q')  e  Q, 
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where  0  and  0  are  the  abstract  filtering  operators  in  (5.12)  and  (5.13),  respectively.  Then  we  have 
that  pk  is  recursively  updated  according  to 

Pk+i  ( %  (po)  ;  4+ 1 )  =  f(Pk  (£  (po) ;  4)  ,ok+i,  o*) , 

initialized  with  po(%(po)'Jo)  =  0o('|^  (po)lfa)>  where  T>o  is  as  defined  in  (5.11).  By  the  result  of 
Lemma  5.1,  on  a  set  of  executions  which  occur  with  probability  one,  po(%(p)',io)  has  the  repre¬ 
sentation  in  equation  (5.35)  and  the  operator  /  has  the  representation  in  equations  (5.36),  (5.37), 
and  (5.38). 

Specializing  (5.15),  the  transition  kernel  v  characterizing  the  evolution  of  the  information  state 
is  given  by 

HE\P,o)  =  E  E  E  Po(o\qf)pq(qf\q,a)p(q), 

q£Qq'£Qo£G{p,G,E) 

where  G(p ,  o,E)  —  {o  e  O  :  f(p,o ,  a)  e  E},  for  p  e  Q,  o  e  E,  and  E  e  38  (Q).  In  practical  terms, 
the  dynamic  programming  operator  defined  in  (5.20)  can  be  rewritten  for  POMDPs  as 

&Safe{J)(P)=™a$  £  £  P  G  £?.  (5-39) 

^  {h,q)£Qql£Q0£0 

By  Theorem  5.1,  the  maximal  probability  of  safety  over  a  finite  time  horizon  [0,7/]  is  computed  by 
the  following  dynamic  programming  recursion: 

P*(P0‘,W)=Y,  L  &tofeM(Po(£(Po)\Oo))po(oo\qo)po(qo)  (5.40) 

qo&Q  oo(zO 

with  the  terminal  cost 


gN(pN)  =  /»jv(1,9jv)-  (5-41) 

qNeW 

By  the  form  of  the  dynamic  programming  equations  (5.39)-(5.41),  the  partial  information 
safety  problem  is  a  terminal  cost  problem  for  the  augmented  POMDP  model  -z^oyidp  with  twice 
the  number  of  discrete  states  as  the  original  POMDP  model  <9£pomdp-  Thus,  in  principle,  one  can 
apply  existing  computational  techniques  for  POMDP  problems  to  find  the  optimal  control  policy 
(Russell  and  Norvig,  2002,  Chapter  17;  Thrun  et  al.,  2005,  Chapter  15).  However,  as  shown  by 
Papadimitriou  and  Tsitsiklis  (1987)  and  Lusena  et  al.  (2001),  the  problem  of  computing  optimal 
or  even  e-optimal  control  policies  for  POMDPs  is  in  general  PSPACE-complete  (note  that  NP  is  a 
subset  of  PSPACE),  due  to  the  possible  exponential  increase  in  complexity  of  the  optimal  policy 
with  respect  to  the  number  of  discrete  states,  observations,  control  actions,  and  time  steps.  As 
such,  exact  optimal  policies  are  typically  computed  only  for  models  with  no  more  than  about  20 
discrete  states. 

In  the  following,  we  will  illustrate  these  procedures  using  a  concrete  example.  Consider  a 
POMDP  tZ^poMDP  with  state  space  Q  =  {<71 ,  <72,  <73,  <74},  control  input  space  E  =  {oy,  Or},  and  ob¬ 
servation  space  O  =  {or.or}.  The  transition  probability  function  pq  for  .^pqmdp  can  be  described 
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in  terms  of  a  transition  probability  matrix  Pq(o)  G  [0,  l]4x4  such  that  pq(qj\qi ,  c)  =  Pq(a)(iJ)  for 
every  qqqj  G  2  and  c  G  E.  For  this  example,  Pq(o)  is  given  below,  with  the  corresponding  state 
transition  diagram  as  shown  in  Figure  5.3. 


PgW 


■  1 

0 

0 

0  ' 

'  0 
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0  ' 
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,Pq(OR)  = 
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The  corresponding  state  transition  diagram  is  given  by 


The  observation  probability  function  pa  for  ^pomdp  can  be  described  in  terms  of  an  observa¬ 
tion  probability  matrix  P0(o)  G  [0,  l]4x4  such  that  p0(o\qj)  =  P0(o)  (/,  z)  for  every  q,  G  Q  and  o  G  O. 
For  this  example,  P0(o)  is  specified  as  follows. 

P0(ol)  =  diag(  a,  a,  1-a,  1-a), 

P0{or)  =  diag  (  1  -  a,  1-a,  a,  a). 

where  a  G  [0, 1].  The  corresponding  state  observation  diagram  is  as  shown  in  Figure  5.4. 

We  consider  an  initial  state  distribution  po  described  in  terms  of  a  vector  po  G  [0,  l]4  such  that 

po(qi)  =Po{i),Vqi  e  Q : 

p0=[H  ^  ^  H  lr 

FU  2  2  2  2  ’ 

where  /3  G  [0, 1].  The  safe  set  is  selected  to  be  W  —  {^2,^3}- 
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Figure  5.4:  State  observation  diagram  for  POMDP  example. 


With  a  representation  of  pk(£,(po);io){h,-)  by  vectors  phk  e  [0,  l]4,  h  =  0, 1,  the  state  filtering 
equations  given  in  (5.35)— (5.38)  can  be  rewritten  in  a  compact  form: 

~1  _  Po(°o)Po  ~0  _  r  n  n  n  n 
Po~iTP0(o0)p0:Po~^  0  0  °]  ’ 

■a  P<>(ok+\)Pq(Ok)T  Mwp\ 


Pk+i  = 


lTPo(ok+i)Pq(ak)T(p°k  +  pl)  ’ 


=  Po(.°k+l)Pq(Ck)T  (P°k  +Mq\WP{) 

Pk  '  irP0(ok+]  )Pq(ok)T{p{l  +  Pi) 

where  1  =  [1  1  1  l]r  and  My/,  Mq\ w  G  [0,  l]4x4  are  diagonal  matrices  given  by 

Mw  —  diag(0,  1,  1,  0),  MQ\W  =  diag(l,  0,  0,  1). 

Given  a  time  horizon  [0./V],  N  >  1,  the  first  step  of  the  dynamic  programming  procedure  de¬ 
scribed  in  (5.39)  can  be  carried  out  as  follows. 

Jn-\^n(p)  =  Safe(gN)(P )  =  max  £  gN{f(p,o,o))\T P0{o)Pq(o)T (p°  +  pl), 

a€Z  oeO 

where  ph  denotes  the  vector  p(h.  ■)  e  [0,  l]4.  By  the  definitions  of  gN  and  /,  we  then  have 
Jn-i^n(P)  =  ma*  £  £  (Po(o)Pq(G)TMwpl)(j)=  max  £  {Pq{o)T  Mwpl)(j), 

oeOqjeW  qjeW 

where  we  use  the  fact  that  £0P0(o)  =  I.  It  then  follows  by  the  definitions  of  Pq(o)  and  My/  that 

\P{  1,92),  P(l,^2)  >P(1,^3) 


4_t^iv(p)  =  max  £  £  cr)p(l,^)  = 


qieWqjeW 


/5(  1 ,  <73) ,  otherwise, 
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with  the  corresponding  optimal  control  policy 


p(1j92)  >p(1,93) 

nN~\\P)  —  \ 

I  oy,  otherwise. 

Using  a  similar  line  of  reasoning,  we  can  show  that  the  optimal  cost-to-go  function  at  time 
k  —  N  —  2  is  given  by 

Jn-2-^N  —  ^«/eO^/-l-Wv)  = 

Thus,  |  _^N  is  a  fixed  point  of  the  operator  -fsafe  and 

Jn(p)  =  Jq^n(p)  =  Jn-]^n(p)-  Vp  e  <2,  iv  >  l. 


Furthermore,  a  stationary  optimal  control  policy  is  given  by  ft*  —  The 

corresponding  maximal  probability  of  safety  is  computed  as 


P*(P0',W)=  Y,  Jn(Po(%(po)'’°o))1TPo{oo)po 
o0EO 


a/3,  a  >0.5 

(1  —  a)/3,  otherwise. 


To  demonstrate  the  value  of  the  augmented  information  state,  we  plot  in  Figure  5.5  the  realiza¬ 
tions  of  pk\k  for  the  original  model  T^pomdp  and  pk  for  the  augmented  model  -ZipoMDP  in  a  sample 
simulation  run  under  the  optimal  policy  ft*  over  time  steps  k  —  0, 1,2.  In  this  simulation,  the  pa¬ 
rameters  of  the  model  were  chosen  to  be  a  =  0.6  and  /3  =0.5.  At  the  initial  step,  the  information 
provided  by  the  regular  information  state  /?0|o  and  the  augmented  information  state  po  are  essen¬ 
tially  the  same,  namely  p0 10  is  the  component  of  po  corresponding  to  ho  =  1.  However,  at  time  step 
k  =  1,  due  to  an  erroneous  measurement,  the  regular  information  state  p0 10  would  seem  to  suggest 
that  the  most  likely  system  state  is  q 3,  while  the  actual  system  state  q2  has  the  least  likelihood  out 
of  all  the  states  with  non-zero  probability.  If  one  were  to  select  an  input  according  to  this  belief, 
then  perhaps  one  would  choose  cj(1  )  —  oL,  which  would  render  the  actual  state  trajectory  unsafe. 
On  the  other  hand,  the  augmented  information  state  splits  the  distribution  p0 10  in  two,  and  weights 
the  conditional  probability  in  each  component  according  to  the  likelihood  that  the  trajectory  has 
been  safe  or  unsafe.  In  particular,  the  safe  component  (h\  —  1)  of  p\  takes  into  account  the  fact 
that  given  the  last  input  was  cr(0)  =  Cl,  the  state  trajectory  could  not  have  been  safe  if  the  current 
system  state  were  <73 .  Thus,  the  state  <73  is  given  a  zero  weighting  in  the  safe  component,  resulting 
in  a  correct  choice  of  input  cr*  (1)  —  Or  according  to  ft*.  In  fact,  one  can  show  that,  for  this  partic¬ 
ular  example,  as  long  as  the  system  state  is  initialized  in  a  safe  state  (either  <72  or  <73),  and  a  correct 
observation  is  obtained  at  time  k  =  0,  then  the  optimal  policy  ft*  would  ensure  a  correct  choice  of 
input  for  all  k>  0,  regardless  of  the  realization  of  the  output  trajectory. 


5.8  Specialization  to  Probability  Density  Models  of  Stochastic 
Hybrid  Systems 

In  this  section,  we  consider  the  more  general  case  of  a  POdtSHS  equipped  with  a  hybrid  state 
space  S  :=  [jqeQ{q }  x  and  a  hybrid  observation  space  Z  =  O  x  where  O  := 
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Figure  5.5:  Sample  simulation  run  of  POMDP  example  over  three  time  steps. 
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{oi,02,.--,om'j  is  the  discrete  observation  space  and  n0  :  Q  — >  N  is  the  dimension  of  the  contin¬ 
uous  observation  space.  To  avoid  technical  complications,  it  is  assumed  that  the  evolution  of  the 
continuous  state  and  the  generation  of  continuous  observations  in  this  system  are  modeled  by  non¬ 
degenerate  probability  distributions,  so  that  the  stochastic  kernels  Vx,  V,-,  Co,  and  £  can  be  described 
in  terms  of  probability  density  functions  on  Euclidean  spaces.  More  precisely,  the  assumption  on 
J4?  is  as  follows. 

Assumption  5.1. 

1.  There  exists  a  Borel-measurable  probability  density  function  px  :  x  S  x  Ca  — >•  M  such 

that  for  each  5  =  (c/.x)  G  S  and  a  G  Ca, 

Vx(X'\ s,a)  =  [  px(xf\(q,x),a)dxJ,  \/X'  G 
Jx' 


2.  There  exists  a  Borel-measurable  probability  density  function  pr  :  M"*'-1  x  S  x  Ca  x  Q  >  M 
such  that  for  each  s  —  (q,x)  e  S,  a  e  Ca,  and  q'  e  Q , 

yr(Xr\s,a:q')  —  f  pr(x'\{q,x),a,q')dx' ,  MX'  e  &(M.n(qh; 

Jx' 


3.  There  exists  a  Borel-measurable  probability  density  function  pzp  :  Z  x  S  — >  M  such  that  for 
each  s  =  {q,x)  G  S, 

Co  ({0}  x  Y'\s)  =  f  pz,o((o,y')\(q,x))dy Vo  GO,F'g  dd(Rn°^); 

Jr 


4.  There  exists  a  Borel-measurable  probability  density  function  pz :  Z  x  S  x  Ca  — »  M  such  that 
for  each  s  —  (q,x)  e  S  and  a  G  Ca, 

C  ({0}  x  Y'\s,a)  =  [  pz((o,y')\(q,x),a)dy' ,  Vo  GO,  F'g  dd(Rn^q)). 

Jr 


Using  the  density  functions  px  and  pr,  we  can  define,  for  each  s  =  (q,x)  G  S  and  a  e  Ca,  a 
Borel-measurable  hybrid  probability  density  function  ps(s'\s,a)  as  follows 


Ps((q',x')\(q,x),a) 


Vq{q\{q,x),a)px{x!\(q,x),a),  if  q’  =q 
Vq(q'\(q,x),a)pr(x'\(q,x),a,q'),  ifq'^q. 


Let  S'  =  U qe(){q}  x  Xq  be  any  Borel  subset  of  the  hybrid  state  space  S  such  that  Xq  G  dd(Wl<'q'1), 
\Jq  G  Q.  Then  the  hybrid  state  transition  kernel  v  can  be  characterized  in  terms  of  the  probability 
density  function  ps  as 
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Thus,  under  Assumption  5.1,  a  POdtSHS  with  hybrid  observation  space  can  be  equivalently 
characterized  in  terms  of  the  tuple  rtf’  —  (S.  Ca.Z.  ps.  pzo,  pz).  We  will  refer  to  this  as  a  probability 
density  model  of  POdtSHS.  It  can  be  observed  that  the  probability  density  functions  ps,  pzp,  and 
pz  are  analogous  to  the  probability  mass  functions  pq  and  p()  in  the  POMDP  case.  For  a  given 
safe  set  W  E  PS(S),  the  corresponding  augmented  system  model  —  (, S,Ca,Z,ps,pzp,pz )  can  be 
defined  similarly  as  in  (5.34). 

To  be  consistent  with  Assumption  5.1,  we  will  consider  initial  state  distributions  po  E  PZ(S) 
which  can  be  characterized  in  terms  of  a  Borel-measurable  probability  density  function  po  :  S  >  M 
such  that 

Po({q}  xX')=  f  po(q,x)dx ,  \/q  E  Q,  X'  E  Pd(Rn^). 

Jx> 

With  a  slight  abuse  of  notation,  we  will  denote  by  E,  (po)  the  probability  density  function  associated 
with  the  initial  state  distribution  E,  (po)  on  the  augmented  state  space.  Then,  given  a  policy  ft'  E 
ft',  the  probability  density  functions  po,  ps,  pzp,  and  pz  induce  a  unique  probability  measure 
Pk(ft',  E,  (po))  on  the  sample  space  Clk  :=  Sk+1  x  Zk+l  x  Ck. 

It  turns  out  that,  for  the  class  of  POdtSHS  considered  here,  the  filtering  equations  given  in 
section  5.3.2  specializes  to  the  Bayesian  update  rule  for  the  conditional  probability  density  of 
the  augmented  state  sk  given  ik  E  Ik  and  po  E  LZ(S).  This  is  analogous  to  the  update  rule  for  the 
conditional  probability  mass  function  in  the  POMDP  case.  For  compactness  of  notation,  we  denote 
the  integration  of  a  Borel-measurable  function  F  :  S  — »  M  over  S  as 


Lemma  5.2.  Let  M*  —  ( S,Ca,Z,ps,pzp,pz )  be  a  probability  density  model  of  POdtSHS  and  W  E 
Pd(S)  be  a  Borel  safe  set.  Let  —  ( S,Ca,Z,ps,pz  o,pz ))  be  the  corresponding  augmented  POdt¬ 
SHS.  Let  po  :  S  — »  M  be  a  Borel-measurable  initial  state  density.  Then  for  every  it'  E  IT  and 
k  =  0,1, ...,  the  stochastic  kernel  pk(E,(po)',ik)  has  the  probability  density  p^(-\E,(po);ik)  :  S  — >  M 
given  by 


where 


Po(so\^(Po)do) 


Pk(h\Z(Po)fk) 


Pz,o(zo\so)^(po)(s0) 

JsPz,o(zoK)Z(Po)  (s~o)ds'o  ’ 

Pz(zk I h ,  ak - 1  )pdk | k_ !  (sk  I  £  (Po) ;  4- 1 ,  ak- 1 ) 

Is  Pz  (Zk  [ s'k ,  a 1 ) Pk\ k- 1  ( ■ s'k  I  %  ( Po ) ;  k- 1 ,  ak- 1 )  ds'k  ’ 


Pk\k~  i  &  I  £  (po) ;  4-i ,  ak- 1  )  =  j  Ps  (h\h-  \-ak-\)  i  (4- 1 1  %  (po)  ;  4-  iK4-i- 

for  Pk(7t'.  g(p(,))  almost  every  ik. 


The  proof  of  this  result  is  again  somewhat  technical  in  nature  and  can  be  found  in  appendix 
D.  Using  the  definitions  of  ps,  pzp,  and  pz,  the  update  equations  for  the  probability  density  in  the 
statement  of  Lemma  5.2  can  be  rewritten  as  follows. 
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1.  Initialization  Step: 


Po(Mo|£(po);*o) 


/?z,o(zoko)po(^o) 

fsPz,o{zo\so)p{so)dso  ’’ 


Po(0,so\^(po)-,io)  =  0; 


(5.42) 


2.  Prediction  Step: 

pf+i|/t(^(Po);4,%)(l,^+t)  =  J  Ps{sk+i\sk,ak)Po(l,Sk\Z(Poy,ik)dsk,  (5.43) 

Pk+i\k(^iPo)^ktak)(0,Sk+i)  =  js Ps (sk+\  \sk; ak)Po(0. sk \ E,  (p0) ; ik)dsk  (5.44) 

+  /  Ps(sk+i\sk,ak)pQ(l,sk\Z(pk);ik)dsk 
Js\w 


3.  Update  Step: 

^+1(%+i,^+1|^(po);4+i)  = 

Pz  ( Zk+ 1 1  Sk+ 1 ,  ak)pdk  J  ,k  (£  (po) ;  4 ,  flit)  (%+ 1 , s*+ 1 ) 

- ^ - T - •  (5.45) 

I-%+ie{0,l}  fsPz(Zk+l  kfe+l  5  ak)Pk-\- i|yt(s  {Po)>  4>  ak)  (hk+l  >  5’£+lM5’fc+l 

Similarly  as  in  the  case  of  a  POMDP,  the  estimation  procedure  given  in  (5.42)-(5.45)  for  a  prob¬ 
ability  density  model  of  POdtSHS  involves  maintaining  a  conditional  probability  density  function 
over  a  hybrid  state  space  with  twice  the  number  of  discrete  states  as  in  the  original  hybrid  system 
model.  However,  while  the  probability  distribution  for  a  POMDP  is  a  vector  of  probabilities,  and 
hence  finite  dimensional,  the  hybrid  probability  density  is  a  real-valued  function  over  the  contin¬ 
uous  state  space  within  each  mode,  and  hence  infinite  dimensional.  It  can  be  also  verified  that 
marginalizing  over  the  history  state  hk  recovers  the  conditional  probability  density  function  of  the 
hybrid  state  sk,  as  produced  by  Bayesian  update  equations  (see  for  example  Kumar  and  Varaiya, 
1986,  chapter  5).  Specifically,  let  Pd\k(-\po',h)  be  the  conditional  probability  density  of  .sy.  over  S, 
given  the  initial  probability  density  po  and  the  information  vector  ik.  Then  we  have 

Pk\k(sk Ipo; 4)  =  Pdk(£,(PoY 4) ( i , sk)  +  Pk (4  (Po) ; 4) (0, sk) ,  Vsk  e  s. 

In  essence,  the  augmented  probability  density  pk  splits  the  regular  probability  density  into  two 
components,  and  weights  each  according  to  the  likelihood  that  the  past  trajectory  has  been  safe. 
To  illustrate  the  form  of  the  augmented  probability  density,  we  consider  a  simple  example  of  1-D 
linear  Gaussian  system,  as  described  below: 

Xk+l  =  xk  +  uk  +  wk, 

yk  =xk  +  v*, 
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where  xq.  wk,  vk  are  random  variables  with  standard  normal  distribution  JV (0. 1 ).  The  proba¬ 
bility  density  model  of  this  system  can  be  then  derived  as  px(xk+i\xkiuk)  =  ■/^(xk  +  llk- 1)  and 
pz(yk\xk,Uk-i)  —  ■''V (xk ■  1)  (the  reset  kernel  coincide  with  the  continuous  state  transitional  kernel 
given  that  there  is  only  one  mode).  It  can  be  seen  that  the  regular  information  state  pk\k  in  this  case 
is  simply  the  output  of  the  Kalman  filter  for  M’ .  Selecting  the  safe  set  as  the  interval  W  —  [—1,1], 
we  simulate  the  system  forward  in  time  and  plot  the  realizations  of  the  Kalman  filter  output  and  the 
augmented  information  state  pk  for  under  somewhat  arbitrary  choices  of  control  input.  In  this 
case,  the  components  of  the  augmented  density  function  pk  are  computed  by  numerical  integration 
using  equations  (5.42)-(5.45).  The  results  of  a  sample  simulation  run  over  time  steps  k  =  0,1,2 
are  shown  in  Figure  5.6. 


x0  =  0.67  y0  =1.86  u0  =  0.2  x1  =  -0.33  }\  =  0.65  w,  =  -0.5 


(a)  Information  states  at  k  =  0 


(b)  Information  states  at  k  =  1 


x,=-0.99  y,  =-1.94  u2—  0.3 


(c)  Information  states  at  k  =  2 

Figure  5.6:  Sample  simulation  run  of  linear  Gaussian  example  over  three  time  steps. 
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As  in  the  case  of  discrete  state  systems,  the  information  provided  by  the  Kalman  filter  and  the 
augmented  probability  density  are  the  same  at  the  first  time  step,  namely  the  safe  component  (ho  = 
1)  of  the  augmented  probability  density  coincide  with  the  regular  conditional  density.  However,  at 
the  second  or  third  time  step,  the  augmented  probability  density  splits  the  Kalman  filter  into  two 
components  according  to  the  likelihood  that  the  state  history  jco  or  (.vo At)  was  in  the  safe  set.  Note 
that  this  is  a  hybrid  probability  density  on  the  augmented  state  space  {0,1}  x  M,  even  though  the 
original  system  is  continuous.  While  the  output  of  a  Kalman  filter  is  Gaussian,  it  is  no  longer  clear 
whether  the  components  of  the  augmented  density  are  in  fact  Gaussian,  although  they  appear  to 
exhibit  Gaussian  characteristics  in  Figure  5.6. 

To  derive  a  compact  form  of  the  dynamic  programming  equations,  we  again  consider  a  Borel- 
measurable  mapping  /  :#(S)xZxCa-t  £P(S)  given  by 

f(p,z,a)  =  <&(ds\'¥(p,ay,z,a), 

where  and  are  the  abstract  filtering  operators  in  (5.12)  and  (5.13),  respectively.  Then  we  have 
that 

Pk+i(^{Po)',ik+i)  =  f(Pk(£(Poy,ik),Zk+hak), 

initialized  with  po(^(po)’,io)  —  d>o(-|<^(po);zo)>  where  d>o  is  as  defined  in  (5.11).  By  the  result 
of  Lemma  5.2,  on  a  set  of  executions  which  occur  with  probability  one,  po(%(po)',io)  has  the 
probability  density  given  in  equation  (5.42)  and  pk(%(po)'Jk )>  k>  l  has  the  probability  density 
given  in  equation  (5.45). 

The  dynamic  programming  operator  3 safe  can  be  rewritten  for  a  probability  density  model  M’ 
of  POdtSHS  as  follows. 

^Safe{J){P)  =  sup  V  /  /  (  /  J(f(p,z,a))pz(z\s',a)dz  )  ps(s'\s,a)ds'pd(h,s)ds,  (5.46) 
aeCl,uJsJs\jz  J 

1  /v 

where  p  is  the  probability  density  associated  with  the  information  state  p  e  S.  By  Theorem  5.1, 
the  maximal  probability  of  safety  for  M’  over  time  horizon  [0,  N]  is  then  computed  by  the  following 
dynamic  programming  recursion: 

P*(po\W)  =  ^  lz3syife(gN)(po(^(po)-,zo))pzfi(zo\so)Po{so)dzodso  (5.47) 

with  the  terminal  cost 


gN{pN)  =  /  pi/(l,sN)dsN.  (5.48) 

Jw 

As  can  be  seen  from  equations  (5.46)-(5.48),  the  computation  of  optimal  safety  probability  for  J4f 
in  general  needs  to  be  carried  out  on  the  space  of  augmented  probability  density  functions.  Thus, 
finding  computationally  tractable  algorithms  for  the  synthesis  of  optimal  control  policies  hinges  on 
the  existence  of  finite  dimensional  representations  of  the  hybrid  probability  density  pi  in  particular 
instances  of  the  POdtSHS  model. 
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Chapter  6 
Conclusions 


In  many  application  scenarios  found  in  practice,  the  overall  system  behavior  features  elements 
from  both  the  discrete  and  continuous  domain.  While  a  hybrid  system  model  provides  a  natural 
abstraction  for  such  behaviors,  the  problem  of  controller  synthesis  for  hybrid  systems  suffers  from 
both  theoretical  and  computational  difficulties.  The  results  presented  in  this  dissertation,  whether 
theoretical,  computational,  or  experimental,  can  be  viewed  as  part  of  an  overall  effort  towards 
systematic  design  and  algorithmic  synthesis  of  feedback  control  policies  to  satisfy  reachability 
specifications  for  hybrid  system  abstractions  of  control  systems.  The  methods  that  we  have  devel¬ 
oped  vary  in  terms  of  the  model  of  interaction  between  the  discrete  and  continuous  dynamics,  as 
well  as  of  the  model  of  uncertainty  in  the  system  behavior,  and  are  to  a  large  degree  motivated  by 
the  needs  of  varying  application  scenarios.  In  the  following,  we  will  provide  a  summary  of  our 
main  results,  along  with  a  discussion  of  directions  for  future  work. 

6.1  Summary 

The  controller  design  and  synthesis  methods  developed  in  the  first  part  of  this  dissertation  are 
concerned  with  deterministic  switched  nonlinear  systems  in  which  the  discrete  transitions  are  con¬ 
trolled.  These  methods  can  be  viewed  as  translations  of  the  abstract  controller  synthesis  algorithms 
for  general  hybrid  systems  as  proposed  in  Lygeros  et  al.  (19991?)  and  Tomlin  et  al.  (2000),  which 
have  been  largely  applied  manually  on  a  case  by  case  basis,  to  systematic  design  procedures  and 
compuational  synthesis  techniques  for  subclasses  of  hybrid  system  models  found  in  practical  ap¬ 
plications.  In  particular,  we  considered  switched  system  models  with  two  different  interpretations 
of  the  discrete  transitions  as  either  the  temporal  phases  of  a  dynamic  process,  or  the  control  choices 
available  to  a  high  level  controller.  In  the  case  of  the  former,  a  hybrid  system  formalism  is  proposed 
for  sequential  transition  systems,  whose  discrete  transitions  follow  a  pre-defined  sequence  and  can 
be  controlled  either  by  automation,  or  by  an  external  human  operator.  For  this  class  of  systems, 
a  systematic  procedure  is  presented  for  designing  continuous  control  laws  and  discrete  switching 
conditions  to  satisfy  sequential  reachability  specifications,  namely  specifications  consisting  of  a 
sequence  of  safety  or  reach-avoid  objective.  This  procedure  uses  the  method  of  Hamilton- Jacobi 
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reachability  analysis,  as  developed  by  Mitchell  et  al.  (2005)  for  nonlinear  continuous  time  systems, 
to  compute  reachable  sets  which  provide  information  on  the  satisfaction  of  a  safety  or  reach-avoid 
objective  under  a  given  continuous  feedback  policy,  as  well  as  the  domain  on  which  it  is  satisfied. 
This  information  is  then  used  to  design  a  feedback  policy,  which  consists  of  both  a  continuous  and 
a  discrete  component,  to  achieve  individual  reachability  objectives  within  the  respective  modes, 
while  assuring  compatibility  across  mode  transitions.  The  design  procedure  is  applied  to  the  ex¬ 
ample  of  automated  aerial  refueling  (AAR).  Simulation  studies  show  that  the  resulting  control 
design  satisfies  the  desired  safety  and  target  attainability  objectives,  and  that  there  is  a  possibility 
for  using  reachable  sets  as  a  guidance  tool  for  decision  making  by  a  human  operator. 

For  switched  systems  in  which  the  discrete  modes  represent  high  level  control  choices,  we  de¬ 
scribed  methods  for  performing  algorithmic  controller  synthesis  to  satisfy  safety  and  reach-avoid 
objectives,  within  a  sampled-data  setting,  and  subject  to  bounded  continuous  disturbances.  In  par¬ 
ticular,  under  a  discretization  of  the  continuous  input  space,  computational  reachability  analysis  is 
performed  over  successive  sampling  intervals  (once  again  using  the  Hamilton-Jacobi  method),  to 
find  the  set  of  initial  conditions  for  which  a  given  reachability  specification  is  feasible,  assuming 
feedback  selections  of  the  discrete  mode  and  continuous  input,  and  worst-case  disturbance  behav¬ 
ior.  From  the  results  of  this  reachability  analysis,  a  set-valued  feedback  law  is  derived  over  the  time 
horizon  of  interest,  represented  as  a  finite  collection  of  reachable  sets,  along  with  an  algorithm  for 
performing  online  control  selections  with  respect  to  the  computed  control  law.  This  synthesis  tech¬ 
nique  is  applied  in  an  experimental  setting  to  the  problem  of  controlling  a  quadrotor  to  reach  and 
then  remain  in  a  hover  region  over  a  moving  ground  target.  The  experimental  results  show  that 
the  reachability-based  control  laws  exhibit  strong  robustness  properties  and  are  for  the  most  part 
capable  of  ensuring  the  desired  specifications,  excepting  occasional,  brief  violations  due  to  under¬ 
estimation  in  the  disturbance  bound.  It  is  also  discussed  how  the  methodology  can  be  extended  to 
perform  controller  synthesis  for  sequential  reachability  problems,  by  appropriately  incorporating 
the  design  procedure  of  chapter  2  to  ensure  compatibility  across  the  different  phases  of  the  reacha¬ 
bility  specification.  This  combined  design  approach  is  applied  to  the  experimental  example  for  the 
two  phases  of  reach  and  hover,  as  well  as  to  the  AAR  example  in  a  simulation  study,  for  the  phases 
of  the  refueling  sequence. 

The  theoretical  and  computational  tools  developed  in  the  second  part  of  this  dissertation  are 
concerned  with  discrete  time  stochastic  hybrid  systems  (DTSHS),  as  motivated  by  applications  in 
which  system  models  are  derived  from  statistical  analysis  or  assumptions.  Based  upon  prior  work 
by  Amin  et  al.  (2006),  Abate  et  al.  (2006),  and  Summers  et  al.  (201 1)  on  probabilistic  reachability 
problems  for  DTSHS,  we  considered  two  extensions  to  account  for  different  models  of  uncertainty. 
In  the  first  extension,  two-player  stochastic  game  formulations  of  the  probability  reachability  prob¬ 
lem  are  analyzed,  in  terms  of  a  model  which  we  referred  to  as  a  discrete-time  stochastic  hybrid 
game  (DTSHG).  These  formulations  feature  a  control  whose  objective  is  to  achieve  either  a  proba¬ 
bilistic  safety  or  reach-avoid  objective,  and  an  adversarial  disturbance  whose  objective  is  assumed 
to  be  opposed  to  that  of  the  control.  Our  analysis  of  these  formulations  generalizes  the  stochastic 
optimal  control  argument  used  by  Amin  et  al.  (2006),  Abate  et  al.  (2006),  and  Summers  et  al. 
(2011)  for  the  single  player  case,  while  also  adapting  results  from  the  literature  on  additive  cost 
stochastic  games  (see  for  example  Kumar  and  Shiau,  1981;  Nowak,  1985;  Gonzalez-Trejo  et  al., 
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2002)  to  the  multiplicative  payoff  structure  of  the  safety  and  reach-avoid  problems.  For  a  feed¬ 
back  Stackelberg  formulation,  with  an  asymmetric  information  pattern  favoring  the  disturbance, 
a  dynamic  programming  algorithm  is  given  for  the  computation  of  the  max-min  probability  of 
satisfying  either  the  safety  or  the  reach-avoid  specifications,  as  the  Stackelberg  value.  Sufficient 
conditions  of  optimality  are  also  derived  for  the  synthesis  of  a  Stackelberg  solution  for  the  control, 
as  a  deterministic  Markov  policy.  The  Stackelberg  value  is  later  shown  to  be  the  lower  value  of 
a  symmetric  feedback  Nash  game,  whose  equilibrium  solutions  are  in  general  found  within  the 
class  of  randomized  policies.  Some  results  on  infinite  horizon  reachability  computation  are  also 
provided,  as  related  to  the  approximation  of  infinite  horizon  value  and  the  existence  of  infinite  hori¬ 
zon  optimal  policies.  The  utility  of  this  methodology  is  illustrated  using  a  two  aircraft  collision 
avoidance  example,  in  which  the  adversarial  uncertainty  is  due  to  the  unknown  intent  of  an  un¬ 
controlled  aircraft,  and  the  stochastic  uncertainty  is  due  to  wind  effects.  Probabilistic  reachability 
computations  are  performed  to  determine  a  conflict  resolution  strategy  which  provides  probabilis¬ 
tic  assurances  of  safety. 

The  second  extension  involves  the  consideration  of  partial  information  in  probabilistic  reach¬ 
ability  problems,  as  pertaining  to  application  scenarios  in  which  control  decisions  are  to  be  made 
with  respect  to  uncertainties  in  the  state  estimate  or  measurement.  These  uncertainties  can  arise 
from  limited  sensor  placements,  lack  of  sensor  precision,  or  measurement  noise.  As  compared 
with  the  large  body  of  previous  work  on  perfect  information  reachability  problems,  there  have 
been  relatively  few  studies  on  the  issue  of  imperfect  state  information  in  the  literature  on  hybrid 
system  reachability.  In  this  dissertation,  we  investigated  partial  information  safety  and  reach-avoid 
problems  within  the  context  of  a  partially  observable  discrete  time  stochastic  hybrid  system  (POdt- 
SHS),  which  formally  accounts  for  the  imperfections  in  state  measurement  through  a  probabilistic 
observation  model.  Our  analysis  shows  that  a  sufficient  statistic  for  the  partial  information  safety 
or  reach-avoid  problems  maintains  inferred  knowledge  about  both  the  current  state  of  the  hybrid 
system  and  the  safety  of  past  state  evolution.  The  added  information,  as  encoded  in  a  binary  ran¬ 
dom  variable  augmenting  the  hybrid  state  space,  differentiates  the  partial  information  reachability 
problems  from  conventional  additive  cost  partial  information  problems,  such  as  the  LQG  problem. 
Through  a  sequence  of  transformations  which  reduces  the  original  partial  information  reachability 
problems  to  perfect  information  terminal  cost  or  additive  cost  problems  on  the  space  of  informa¬ 
tion  states,  we  apply  the  results  of  Bertsekas  and  Shreve  (1978)  to  derive  a  dynamic  programming 
algorithm.  It  is  then  shown  in  the  case  of  a  POMDP  model,  with  discrete  state,  control,  and  obser¬ 
vation  spaces,  the  information  state  is  a  probability  distribution  over  twice  the  number  of  discrete 
states  as  the  original  model,  and  hence  finite  dimensional.  However,  in  the  case  of  a  stochastic 
hybrid  system  model  with  probability  density  descriptions  over  continuous  state  spaces,  the  space 
of  information  states  is  in  general  infinite  dimensional.  Thus,  computational  solutions  would  have 
to  be  found  in  particular  instances  with  finite  dimensional  representations  or  approximations  for 
the  hybrid  conditional  density. 
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6.2  Future  Work 


While  we  have  taken  some  important  first  steps,  the  transition  of  reachability -based  controller 
design  methods  from  research  and  development  to  implementation  in  practical  applications  will 
require  resolving  a  number  of  theoretical  and  computational  issues.  While  some  of  these  issues 
were  alluded  to  in  the  individual  chapters,  we  will  highlight  here  some  of  the  main  challenges,  and 
offer  some  ideas  for  future  research. 

6.2.1  Approximation  of  Deterministic  Reachable  Sets 

By  using  the  Hamilton- Jacobi  approach  for  reachable  set  computation  within  each  mode  of  a 
switched  system,  the  controller  design  methods  described  in  Part  I  of  this  dissertation  can  ac¬ 
commodate  system  models  with  up  to  four  or  five  continuous  state  dimensions.  While  this  may 
be  sufficient  for  abstractions  of  control  systems  with  a  small  number  continuous  states,  as  in  the 
aircraft  conflict  resolution  and  AAR  example,  or  applications  in  which  there  is  decoupling  in  the 
continuous  dynamics,  as  in  the  quadrotor  target  tracking  example,  there  may  be  practical  scenarios 
in  which  one  may  not  be  able  to  capture  the  richness  of  continuous  behavior  using  a  low  dimen¬ 
sional  model.  For  such  cases  it  is  important  to  investigate  alternative  methods  for  the  reachable  set 
computation. 

Due  to  the  fact  that  reachable  sets  for  general  nonlinear  systems  can  exhibit  increasingly  com¬ 
plex  shapes  as  one  allows  for  increasing  degrees  of  freedom,  it  comes  as  no  surprise  that  numerical 
techniques  for  approximating  such  sets  in  general  features  a  trade-off  between  accuracy  and  com¬ 
putational  efficiency.  While  the  Hamilton-Jacobi  method  can  provide  highly  accurate  approxima¬ 
tions  of  reachable  sets  for  low  dimensional  systems,  it  is  also  limited  by  its  exponential  growth  in 
computational  complexity.  Some  proposed  methods  for  approximation  of  reachable  sets  for  non¬ 
linear  systems  include  Mitchell  and  Tomlin  (2003),  Stipanovic  et  al.  (2004),  Hwang  et  al.  (2005), 
and  Mitchell  (2011),  with  varying  forms  of  reachable  set  representation  and  levels  of  conservatism 
in  the  approximation.  It  would  be  interesting  to  investigate  the  possible  use  of  these  methods  for 
the  reachability  computations  described  in  chapter  2  and  3,  with  appropriate  modifications  of  the 
controller  design  techniques  to  account  for  approximation  errors. 

For  hybrid  system  models  whose  continuous  dynamics  are  linear,  a  number  of  alternative  reach¬ 
ability  analysis  techniques  are  available  for  the  computation  of  approximate  reachable  sets  in  con¬ 
tinuous  time,  based  upon  representations  such  as  polyhedra  (Asarin  et  al.,  2000a;  Chutinan  and 
Krogh,  2003),  ellipsoids  (Kurzhanski  and  Varaiya,  2000),  and  zonotopes  (Girard,  2005).  More 
recently,  a  method  has  been  proposed  by  Kaynama  and  Oishi  (2011)  for  approximate  reachabil¬ 
ity  analysis  of  linear  time  invariant  systems  using  Schur-based  decomposition.  While  not  all  of 
these  methods  explicitly  consider  dynamic  game  formulations  of  reachability  problems,  it  should 
be  noted  that  the  design  procedures  of  chapter  2  and  the  controller  synthesis  algorithms  of  chap¬ 
ter  3  only  require  reachable  set  computations  under  differential  inclusions  (with  respect  to  the 
disturbance  input).  However,  as  noted  by  Mitchell  (2007 £>),  there  are  subtle  differences  between 
these  methods  in  their  abilities  to  handle  reachable  set  computation  under  existentially  quantified 
inputs  or  universally  quantified  inputs.  Given  that  our  design  method  employs  both  capture  sets, 
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which  are  computed  under  universally  quantified  disturbance  inputs,  and  unsafe  sets,  which  are 
computed  under  existentially  quantified  disturbance  inputs,  one  should  be  careful  in  the  selection 
of  an  approximation  method  which  allows  for  both  types  of  reachability  computations. 

6.2.2  Approximation  of  Probabilistic  Reachability  Computations 

In  an  analogous  fashion  as  the  deterministic  case,  the  applicability  of  the  controller  design  methods 
described  in  chapter  4  for  DTSHG  models  depends  on  one’s  ability  to  approximate  the  max-min 
safety  and  reach-avoid  probabilities.  Using  a  piecewise  constant  approximation,  as  adapted  from 
the  approach  described  in  Abate  et  al.  (2007)  for  the  single  player  case,  we  have  been  able  to  per¬ 
form  probabilistic  reachability  computations  in  low  dimensional  examples.  However,  this  method 
suffers  from  a  similar  type  of  exponential  growth  in  complexity  as  the  Hamilton-Jacobi  method, 
due  to  the  choice  of  a  uniform  grid  over  the  hybrid  state  space.  Thus,  a  more  computationally  ef¬ 
ficient  approximation  algorithm,  with  provable  bounds  on  the  approximation  error  will  be  needed 
for  problems  with  large  continuous  state  dimensions.  A  possible  approach  is  to  investigate  exten¬ 
sions  of  methods  that  have  been  developed  in  the  realm  of  approximate  dynamic  programming. 
For  example,  various  approaches  have  been  proposed  for  using  adaptive  gridding  of  the  state  space 
(Munos  and  Moore,  2002),  or  parameterized  families  of  basis  functions  (Bertsekas  and  Tsitsik- 
lis,  1996;  de  Farias  and  van  Roy,  2003;  Kveton  et  al.,  2006)  to  approximate  the  optimal  value 
function  in  deterministic  or  stochastic  optimal  control  problems.  The  difficulty  in  applying  these 
approaches,  however,  lies  in  finding  suitable  adaptive  grids  and  basis  functions  which  result  in 
accurate  and  tractable  computation  algorithms  for  probabilistic  reachability  problems.  One  effort 
in  this  direction  can  be  found  in  the  work  of  Esmaeil  Zadeh  Soudjani  and  Abate  (2011)  which 
describes  an  adaptive  mesh  refinement  method  for  the  approximation  of  the  optimal  safety  prob¬ 
ability  of  a  DTSHS  in  the  single  player  case.  Another  approach  that  has  been  proposed  recently 
by  McEneaney  (2011)  interprets  the  dynamic  programming  operator  for  stochastic  optimal  control 
problems  as  an  abstract  semigroup  operator  on  a  max-plus  algebra.  Through  this  viewpoint,  the 
value  function  for  certain  classes  of  problems,  such  as  those  with  affine  dynamics  and  additive 
quadratic  cost  functions,  can  be  represented  through  a  pointwise  minimum  of  quadratic  functions. 
It  would  be  of  interest  to  investigate  whether  this  approach  can  be  extended  to  the  multiplicative 
indicator  cost  functions  encountered  in  probabilistic  reachability  problems. 

6.2.3  Computational  Approaches  to  Partial  Information  Probabilistic 
Reachability  Problems 

While  the  results  of  chapter  5  provide  us  with  important  insights  into  the  structure  of  partial  in¬ 
formation  safety  and  reach-avoid  problems,  they  also  serve  to  highlight  the  challenge  of  optimal 
control  when  we  do  not  have  accurate  measurements  or  estimates  of  the  system  state.  In  partic¬ 
ular,  even  in  the  case  that  the  system  only  features  continuous  dynamics  (e.g.  a  linear  Gaussian 
system),  the  information  needed  to  perform  optimal  control  is  in  general  a  conditional  probabil¬ 
ity  distribution  over  a  hybrid  state  space,  with  two  discrete  states.  Thus,  finding  computational 
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solutions  to  such  problems  requires  further  understanding  of  hybrid  estimation  and  the  represen¬ 
tation  of  hybrid  probability  distributions.  One  possibility  is  to  explore  parameterized  families  of 
functions  which  provide  accurate  approximations  to  the  conditional  probability  distribution.  In  the 
case  that  the  parameterization  is  finite  dimensional,  then  it  may  be  possible  to  develop  approximate 
dynamic  programming  algorithms  to  compute  suboptimal  control  policies  on  the  space  of  parame- 
terizations,  with  bounds  on  the  suboptimality.  An  alternative  approach  is  to  formulate  methods  for 
computing  the  optimal  safety  or  reach-avoid  probabilities  with  respect  to  particular  choices  of  es¬ 
timators.  Performance  comparisons  can  be  then  made  across  the  different  estimators  to  decide  on 
a  final  design.  It  is  important  to  note  that  for  the  special  case  in  which  the  discrete  mode  is  known, 
and  the  continuous  state  estimation  error  is  bounded,  one  can  potentially  include  the  estimation 
error  as  part  of  the  disturbance  and  hence  address  the  problem  within  the  framework  of  a  DTSHG. 
To  illustrate  this,  consider  a  POdtSHS  described  as  follows. 

q(k+  1)  ~  vq(-\(q(k),x(k)),(a(k),u(k))),  q(k)  G  Q ,  (6.1) 

x(k+  1)  =  f(q(k),x(k),u(k),w(k)),  x(k)  G  K”, 
o(k)  =  q(k),  y{k)  —x(k)+v(k), 

where  o(k)  is  the  discrete  observation,  y(k)  is  the  continuous  observation,  w(k)  is  the  process  noise, 
and  v(k)  is  the  continuous  state  measurement  or  estimation  error,  with  the  probability  distribution 
Pv(dv )  over  W.  Now  suppose  that  the  probability  distribution  of  the  estimation  error  has  compact 
support,  namely  PV(B)  —  1  for  some  compact  Borel  set  B  G  &(Rn).  We  can  rewrite  the  model  in 
(6.1)  as 


o{k+  1)  ~  v9(-| (o(k),y(k)*~v(k)),(a(k),u(k))),  o{k)  G  Q ,  (6.2) 

y{k+  1)  =  f(o(k),y(k)  -v(k),u(k),w(k))  +v(k+  1),  y(k)  eM", 

If  one  were  to  treat  the  observation  error  in  a  worst-case  fashion,  then  (6.2)  describes  a  DTSHG 
model  with  the  disturbance  b(k)  =  [v(k)  v(k+  l)]r  G  B2,  and  the  methodology  in  chapter  4  of 
this  dissertation  applies.  However,  it  can  be  seen  that  if  the  set  B  is  large  with  respect  to  the 
reachability  specification  of  interest,  the  results  can  be  conservative.  Furthermore,  in  the  case  that 
the  observation  error  does  not  feature  bounded  support,  for  example  in  the  case  of  a  Gaussian 
distribution,  then  a  different  analysis  technique  is  required  in  order  to  quantify  the  safety  or  reach- 
avoid  probability. 

6.2.4  Consideration  of  Multi-Objective  Problems 

For  many  safety-critical  control  applications,  the  performance  specifications  consist  of  both  con¬ 
straint  satisfaction  and  cost  minimization  objectives.  The  former  objectives  are  often  of  primary 
importance  in  ensuring  safe  and  correct  system  behavior,  and  have  been  studied  in  this  dissertation 
in  the  form  of  safety  and  reach-avoid  problems,  with  proper  interpretation  for  reachability  speci¬ 
fications  as  state  constraints  and  input  spaces  as  control  constraints.  The  latter  objectives,  on  the 
other  hand,  ensure  that  the  controller  does  not  consume  more  resources  than  it  needs  to  satisfy  the 
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constraints.  For  example,  in  an  aircraft  conflict  resolution  scenario,  one  would  like  to  ideally  design 
controllers  which  generate  safe  and  fuel-efficient  trajectories.  In  Lygeros  et  al.  (199%),  an  abstract 
design  methodology  is  proposed  for  a  general  class  of  hybrid  systems,  whereby  the  different  con¬ 
trol  objectives  are  considered  in  a  sequence  of  design  steps  according  to  their  order  of  importance, 
with  the  controller  design  from  the  more  important  objectives  serving  as  constraints  for  the  less 
important  objectives.  Within  the  context  of  the  controller  synthesis  algorithms  for  sampled-data 
switched  systems  (chapter  3)  and  DTSHG  models  (chapter  4),  this  idea  can  be  potentially  imple¬ 
mented  through  an  added  layer  of  dynamic  programming  procedure,  by  using  the  results  of  the 
reachability  calculations  as  input  constraints.  In  the  switched  system  case,  these  constraints  can 
be  derived  from  the  set-valued  control  laws  given  in  sections  3.4  and  3.5,  while  in  the  DTSHG 
case,  these  constraints  can  be  derived  from  the  sufficient  conditions  of  optimality  given  in  sections 
4.4  and  4.6.  Some  possible  issues  to  such  an  approach,  however,  include  whether  the  value  func¬ 
tions  of  the  resulting  constrained  optimal  control  problem  satisfy  necessary  properties  for  dynamic 
programming,  and  also  whether  computationally  tractable  algorithms  can  be  formulated. 

6.2.5  Accounting  for  Autonomous  Switches  in  Deterministic  Continuous 
Time  Hybrid  Systems 

Hybrid  systems  with  controlled  switching,  as  studied  in  Part  I  of  this  dissertation,  are  suitable 
for  application  scenarios  in  which  the  dynamics  of  a  physical  process  is  well-approximated  by  a 
nonlinear  vector  field  (e.g.  the  kinematics  of  an  aircraft),  and  the  switching  behavior  is  introduced 
through  the  discrete  set  of  control  choices  (e.g.  flight  maneuvers)  at  the  higher  levels  of  abstraction. 
However,  there  is  a  number  of  instances  in  which  autonomous  discrete  transitions,  as  triggered  by 
changes  in  the  continuous  state,  provides  a  natural  abstraction  of  system  behavior.  This  includes 
systems  featuring  event-triggered  finite  state  machine  models  in  high  level  control,  pre-designed 
switching  laws  between  modes  of  operation,  or  sharp  changes  in  continuous  dynamics  caused  by 
idealized  physical  modeling  (e.g.  elastic  impacts).  Examples  of  such  systems  range  from  auto¬ 
motive  engines  (Balluchi  et  al.,  2000),  power  electronics  (Aimer  et  al.,  2007),  to  bipedal  walkers 
(Ames  et  al.,  2009). 

However,  the  consideration  of  autonomous  switching  for  continuous  time  systems  is  also  ac¬ 
companied  by  a  significant  increase  in  the  difficulty  of  analyzing  system  properties.  Specifically, 
state  dependent  switching  introduces  the  possibility  for  discontinuous  vector  fields,  which  can  re¬ 
sult  in  infinitely  fast  switching  or  chattering  at  the  switching  boundary.  This  is  referred  to  as  a  Zeno 
behavior  in  the  hybrid  systems  literature  (Zhang  et  al.,  2001;  Ames  et  al.,  2005).  The  analysis  of 
such  scenarios  typically  requires  generalized  solution  concepts  for  continuous  trajectories  (Filip¬ 
pov,  1988;  Ames  et  al.,  2006).  While  it  may  be  reasonable  in  certain  cases  to  work  with  models 
which  preclude  this  behavior,  the  discontinuities  in  the  vector  field  would  nonetheless  violate  the 
typical  assumptions  of  Lipschitz  continuity  in  the  analysis  of  viscosity  solutions  to  HJB  or  HJI 
equations  (Evans  and  Souganidis,  1984;  Bardi  and  Capuzzo-Dolcetta,  1997). 

There  are  several  possibilities  for  overcoming  this  difficulty.  One  direction  is  to  consider  mod¬ 
ifications  of  the  Hamilton- Jacobi  method  for  reachable  set  computation,  whereby  autonomous 
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switching  is  handled  through  proper  definition  of  boundary  conditions  (Mitchell,  2002).  Another 
possibility  is  to  explore  the  use  of  explicit  reachable  set  computation  techniques  which  propagate 
sets  directly  under  differential  flows,  and  have  been  applied  to  examples  with  autonomous  switch¬ 
ing  (see  for  example  Asarin  et  al.,  2000a;  Chutinan  and  Krogh,  2003;  Botchkarev  and  Tripakis, 
2000;  Girard  and  Le  Guemic,  2008).  As  discussed  previously,  the  controller  design  methods  in 
Part  I  can  accommodate  alternative  reachability  analysis  techniques  as  long  as  computations  can 
be  performed  under  both  existentially  and  universally  quantified  inputs.  Finally,  it  appears  promis¬ 
ing  to  apply  existing  methods  based  upon  viability  theory  (Cardaliaguet  et  al.,  1999;  Aubin  et  al., 
2002;  Saint-Pierre,  2002;  Gao  et  al.,  2007).  These  methods  have  the  advantage  of  being  able  to  han¬ 
dle  nonlinear  dynamics,  differential  games,  and  discontinuous  vector  fields.  Some  disadvantages, 
as  compared  with  the  Hamilton- Jacobi  approach,  include  the  loss  of  value  function  information 
outside  reachable  sets,  and  the  difficulty  of  assuring  subgrid  accuracy  due  to  relaxation  of  the  con¬ 
tinuity  assumption.  The  former  is  not  a  significant  impediment  to  our  controller  design  procedures, 
as  we  only  require  representations  of  sets,  rather  than  value  functions.  The  latter  issue  will  require 
further  investigation,  as  checking  set-memberships  for  a  given  continuous  state  away  from  grid 
nodes  may  require  numerical  interpolation. 

6.2.6  Reducing  Conservativeness  of  Max-min  Solutions  for  DTSHG 

In  our  discussion  of  stochastic  game  formulations  of  the  probabilistic  reachability  problems,  we 
primarily  assumed  an  asymmetric  information  pattern  which  favors  the  adversary.  While  this  al¬ 
lows  us  to  provide  robust  performance  guarantees  with  respect  to  the  worst-case  adversary  behav¬ 
ior,  the  resulting  control  policy  can  be  also  somewhat  conservative,  as  it  assumes  that  choices  of 
control  are  revealed  to  the  adversary  at  each  discrete  time  instant.  However,  as  we  discussed  in 
section  4.5.2,  if  one  were  to  consider  a  symmetric  zero-sum  game  formulation,  the  existence  of 
an  equilibrium  solution  would  often  require  randomized  policies,  as  opposed  to  the  deterministic 
policies  of  an  asymmetric  formulation.  This  is  in  stark  contrast  with  continuous  time  differential 
games,  in  which  the  saddle-point  condition  can  be  satified  by  a  large  class  of  nonlinear  systems, 
whose  dynamics  are  affine  in  the  inputs  (i.e.  feedback  linearizable  systems).  One  intuitive  expla¬ 
nation  is  that  if  one  were  to  consider  the  discrete  time  system  as  a  sampled  continuous  time  system, 
then  the  choices  of  inputs  on  each  sampling  interval  can  be  interpreted  in  terms  of  an  open-loop 
rather  than  a  feedback  game.  Thus,  for  discrete  time  models  that  are  derived  from  continuous  time 
ones,  it  would  be  interesting  to  investigate  conditions  under  which  the  gap  between  the  upper  and 
lower  values,  in  this  case  corresponding  to  the  Stackelberg  values  of  two  asymmetric  games,  would 
become  smaller  as  the  time  discretization  is  reduced. 

Another  possible  direction  is  to  explicitly  consider  the  possible  use  of  randomized  policies. 
While  they  are  not  often  found  within  the  classical  control  applications,  such  type  of  control  poli¬ 
cies  are  of  the  norm  in  many  modern  communication  protocols.  In  particular,  the  medium  access 
control  scheme  of  the  IEEE  802. 1 1  standard  for  wireless  networks  requires  a  station  to  select  a 
random  backoff  time  if  it  finds  the  channel  to  be  busy  when  trying  to  transmit  data,  resulting  in 
the  well-known  Markov  chain  model  for  the  802.11  family  of  protocols  (Bianchi,  2000).  When 
viewed  from  a  multi-player  game  perspective,  the  reason  for  this  randomization  becomes  some- 


162 


what  intuitive.  Namely,  as  each  station  with  a  packet  to  transmit  is  trying  to  gain  access  to  a  com¬ 
mon  medium  (i.e.  the  channel)  without  knowledge  of  whether  other  stations  might  be  transmitting 
during  the  same  time  slot,  selection  of  randomized  transmission  times  is  one  way  to  minimize 
the  possibility  of  collision.  However,  it  should  also  be  noted  that  this  very  choice  of  random¬ 
ization  is  one  of  the  reasons  that  the  performance  of  a  wireless  communication  network  is  often 
much  less  predictable  as  compared  with  a  traditional  wired  network.  Thus,  in  deciding  between 
deterministic  and  randomized  policies  for  a  given  application,  one  should  first  consider  the  prac¬ 
tical  implications  of  a  randomized  approach.  If  such  an  approach  is  found  to  be  reasonable,  then 
the  next  step  becomes  addressing  the  computational  issues.  For  cases  in  which  the  input  space  is 
continuous,  the  randomized  policy  space  becomes  infinite  dimensional,  namely  it  is  the  set  of  prob¬ 
ability  distributions  over  the  input  space.  The  problem  then  involves  selecting  a  finite  dimensional 
parameterization  of  a  subclass  of  the  randomized  policies,  and  finding  computationally  tractable 
algorithms  for  carrying  out  the  dynamic  programming  calculations  with  respect  to  the  choice  of 
parameterization . 

6.2.7  Expanding  the  Class  of  Permissible  Specifications 

Our  work  in  this  dissertation  has  considered  in  some  detail  two  types  of  reachability  specifica¬ 
tions.  The  first  class  is  specifications  with  safety  or  invariance  objectives,  and  the  second  class 
is  those  with  target  attainability  objectives  subject  to  a  safety  constraint.  While  they  encompass 
a  large  range  of  atomic  performance  specifications  encountered  in  practice,  the  complete  design 
specifications  of  real-world  control  system  often  feature  a  combination  of  state-based  objectives 
with  temporal-based  objectives.  The  sequential  reachability  specification  considered  in  chapter  2 
provides  a  simple  example  of  this  type  of  specifications.  Namely,  not  only  do  we  want  to  satisfy 
individual  safety  or  reach-avoid  specifications,  we  would  also  like  them  to  be  satisfied  in  a  certain 
temporal  order.  Thus,  the  problem  becomes  one  of  composition  between  controllers  satisfying 
individual  safety  or  reach-avoid  objectives.  One  of  the  immediate  extensions  would  be  to  investi¬ 
gate  the  stochastic  counterpart  to  this  design  approach  for  DTSHG  models,  in  particular,  whether 
nested  dynamic  programming  algorithms  can  be  carried  out  to  compute  the  max-min  probability 
of  satisfying  a  sequence  of  safety  and  reach-avoid  objectives. 

Over  the  longer  term,  it  would  be  interesting  to  explore  whether  sequential  reachability  specifi¬ 
cations  can  be  expanded  to  accommodate  a  richer  class  of  temporal  objectives,  such  as  handled  by 
discrete  state  model  checking  languages,  including  linear  temporal  logic  (LTL),  computation  tree 
logic  (CTL),  and  probabilistic  computation  tree  logic  (PCTL).  The  primitives  of  these  languages, 
such  as  “always  “eventually  <jb,”  and  “</>|  until  <j>2 ,”  where  (j)\  and  </b  are  logic  statements,  have 
interpretations  in  terms  of  the  reachability  specifications  discussed  in  this  dissertation.  Speaking 
somewhat  informally,  suppose  that  (j)\  is  “remain  in  a  safe  set  W”  and  </b  is  “reach  a  target  set 
R ,”  then  the  statements  given  previously  correspond  to  safety,  terminal  reachability,  and  reach- 
avoid,  respectively.  The  power  of  these  specification  languages,  however,  lies  in  the  combination 
of  these  primitives,  along  with  logic  operators  to  produce  complex  specifications  such  as  “reach 
Ri  if  a  control  command  di  is  received  after  visiting  AS  or  R3,  while  always  avoiding  A 1  and  A 2.” 
For  the  interested  reader,  a  comprehensive  overview  of  LTL  and  CTL  can  be  found  in  the  survey 
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by  Emerson  (1990),  while  a  detailed  exposition  on  PCTL  can  be  found  in  Hansson  and  Jonsson 
(1994). 

Efforts  to  extend  synthesis  algorithms  for  temporal  logic  specifications  to  systems  with  contin¬ 
uous  dynamics  include  the  work  by  Belta  et  al.  (2005);  Tabuada  and  Pappas  (2006);  Kloetzer  and 
Belta  (2008);  Fainekos  et  al.  (2009);  Kress-Gazit  et  al.  (2009).  These  methods  typically  proceed 
by  discretizing  the  state  space  into  a  finite  number  of  partitions,  and  designing  continuous  con¬ 
trollers  to  satisfy  objectives  of  staying  within  a  partition  or  reach  another  partition  in  finite  time. 
From  the  point  of  view  of  high  level  control,  the  partitions  become  the  states  of  a  discrete  abstrac¬ 
tion,  with  the  continuous  controllers  implementing  the  state  transitions  in  the  discrete  abstraction. 
The  solution  to  the  continuous  synthesis  problem  can  be  then  obtained  from  the  result  of  a  dis¬ 
crete  synthesis  algorithm.  However,  due  to  the  difficulties  of  constructing  continuous  controllers 
implementing  the  required  transition  behaviors,  applications  of  these  methods  have  been  mostly 
restricted  to  systems  with  affine  continuous  dynamics. 

Combining  the  insights  from  these  previous  works  with  our  experiences  in  addressing  the  se¬ 
quential  reachability  specification,  it  would  appear  that  the  problem  of  synthesizing  controllers  to 
satisfy  state -based  objectives  in  conjunction  with  temporal-based  objectives  is  inherently  a  hybrid 
control  problem.  Namely  a  discrete  structure  is  induced  by  the  temporal  objectives  over  the  set  of 
atomic  state-based  reachability  objectives.  To  be  somewhat  more  concrete,  in  the  case  of  the  se¬ 
quential  reachability  problem,  the  discrete  structure  is  given  by  a  sequence  of  transition  states  and 
stationary  states,  while  the  atomic  reachability  objectives  are  given  by  the  reach-avoid  objectives 
within  the  transition  states  and  the  invariance  objectives  within  the  stationary  states.  This  then 
suggests  a  two  staged  approach  to  the  synthesis  problem,  whereby  the  discrete  structure  is  inferred 
from  a  given  specification  during  the  first  stage,  and  the  atomic  controllers  are  constructed  during 
the  second  stage,  with  proper  considerations  for  composition  between  the  atomic  controllers.  With 
improvements  in  the  computational  efficiency  of  reachability  analysis,  it  is  the  hope  of  the  author 
that  such  an  approach  would  provide  an  avenue  for  addressing  a  range  of  interesting  controller 
design  problems  arising  in  practical  applications. 
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Appendix  A 

Proof  of  Lemma  4.3 


First  we  recall  the  notion  of  a  simple  function  (see  for  example  Folland,  1999). 

Definition  A.l.  Let  (X,&(X))  be  a  borel  space.  A  simple  function  on  A  is  a  finite  linear  combi¬ 
nation,  with  complex  coefficients,  of  characteristic  functions  of  sets  in  SS{X). 

More  concretely,  a  simple  function  is  of  the  form  /  =  Y*k=\  zk^Ek,  where  Zk  G  C  and  Ek  e  f$(X). 
Below  is  a  result  for  approximation  of  measurable  functions  by  simple  functions,  stated  as  Theorem 
2.10(a)  in  Folland  (1999). 

Lemma  A.l.  Let  (X,Ad(X))  be  a  borel  space.  If  f  :  X  — »  [0,°°)  is  bounded  and  measurable,  then 
there  exists  a  sequence  { (),,}  of  simple  functions  with  real  coefficients  such  that  0  <  (j)  i  <  <h<  —  < 
f,  and  (j>„  — >  f  uniformly  on  X. 

We  will  also  need  the  following  well-known  result  from  real  analysis  (stated  as  Theorem  7.11 
in  Rudin  (1976)). 

Lemma  A.2.  Let  {/„} ,  n  =  1,2, ...  and  f  be  real-valued  functions  on  a  set  E  in  a  metric  space  X 
such  that  fn  — »  f  uniformly  E.  Let  x  be  a  limit  point  ofE,  and  suppose  that 

lim  fn(t)  =An 

t — 

for  n  =  1,2,....  Then  {A,,}  converges,  and 

lim  fit)  =  lim  An 

t^x  y  ’  n-r oo 

This  result  essentially  allows  an  exchange  of  limits 

lim  lim  fn{t)  =  lim  lim  fn(t) 

n— >°°t— >x 


when  the  convergence  of  fn  to  /  is  uniform. 
The  proof  now  proceeds  as  follows. 
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Proof.  With  the  observation  that 


I  f(y)t(dy\x)  =  I f+  (y)t(dy\x)  -  f  f  ( y)t(dy\x ) 

where  f+  and  f~  are  the  positive  and  negative  parts  of  /,  we  can  consider,  without  loss  of  gener¬ 
ality,  the  case  of  /  >  0. 

Let  .vo  be  a  limit  point  of  X  and  {xm}fn=x  be  a  sequence  in  X  such  that  xm  — >■  xq  as  m  — >  For 
each  m  >  0,  there  exists  a  Borel-measurable  function  fm  on  Y  and  a  Borel  subset  Bm  of  Y  such  that 
/  =  fm  on  Bm  and  t(Bm \xm)  —  1  (Lemma  7.27  of  Bertsekas  and  Shreve  (1978)).  Let  B  =  (J,„>0  Bm, 
then  t(B\xm)  =  1  ,Vm  >  0.  Define  a  function  /  :  B  — >  [0,°°)  by  f(x)  =  fm{x),  if  x  G  Bm.  This 
definition  is  possible  since  for  any  m\  and  m2  such  that  x  G  B„n  fl  B,„2,  we  have 

fmx{x)  =f(x)  =fm2{x) 

Furthermore,  /  is  also  Borel-measurable  on  B  under  the  observation  that  for  any  Borel  subset  A  of 

[0,°o) 

r'(A)=  U 

m>  0 

K 

By  Lemma  A.l,  there  exists  a  sequence  of  simple  functions  {0,,}  of  the  form  (j)n  =  En, 

where  znk>  0  and  Ej'  G  YXfY),  such  that  0  <  (f)\  <  (b  <  •  •  •  <  j\  and  — >  f  uniformly  on  Y . 

Define  a  function  g  :  X  — >  [0,  °°)  as 

SO)  =  /  f(y)t{dy\x) 

Jb 

and  functions  gn  :  X  — >  [0,°°),  n  G  N  as 

8n(x)  =  /  (j)n(y)t{dy\x) 

Jb 

Then  by  the  Monotone  Convergence  Theorem  (see  for  example  Folland,  1999,  Theorem  2.14), 
g(x)  =  1  i m„  ^ g„  (v) ,  y.v  G  X.  Furthermore,  we  claim  that  this  convergence  is  uniform.  Indeed, 
given  the  uniform  convergence  of  (j)„  to  /,  we  have  for  every  e  >  0  some  N  G  N  such  that 

f(y)  -  tyn(y)  <  £,  Vy  eY,n>N 

Thus,  for  any  xGl  and  n  >  N,  we  have 

g{x) -  gn(x )  =  f  (f(y)  -  ( K(y))t(dy\x )  <  £ 

Jb 

which  completes  the  proof  of  the  claim. 

Now  for  each  m  >  0,  n  G  N,  the  definition  of  Lebesgue  integrals  implies 

r 

gn(xm)=  /  tyn(y)t(dy\xm)  =  Y^znkt(.Ek\xm) 

Jb  k=  1 
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By  the  continuity  assumption  on  t ,  t { E" \xm )  — >  t(Ej!\xo)  as  m  — >  °°.  Thus, 

Kn  r 

lim  gn(xm)  =  =  /  <l>n(y)t(dy\xo) 

m  ^°°  "  Jb 

Applying  Lemma  A. 2,  we  conclude  that 

lim  /  f(y)t(dy\xm)  =  lim  g(xm)  =  lim  /  0n(y)f(dy|*o)  =  /  /0)r(dy|xo) 

m— >oo  j g  m— >°°  7?— >oo  j g  ] g 

where  the  last  equality  follows  by  a  repeated  application  of  the  Monotone  Convergence  Theorem. 
The  statement  of  Lemma  4.3  now  follows  directly: 


[  f{y)t{dy\xm)  =  [  f(y)t{dy\xm)  =  f  f{y)t\dy\xm) 

J  Y  j  B  J  B 

-»•  /  /CyM^M  =  /  /(yM<fy|*o)  =  [  f(y)t(dy\x o) 

JB  JB  JY 

as  m  — >  oo,  which  completes  the  proof.  □ 


185 


Appendix  B 

Proof  of  Proposition  4.7 


Proof.  From  the  proof  of  Lemma  4.5,  we  have  that  rf  (R,W')  —  2?n(1r)(sq)  is  monotonically 
increasing  for  every  sq  E  S.  Thus,  by  the  definition  of  V4  in  (4.47),  it  can  be  inferred  that  for  each 
N>  1, 

r%(R,W’)  <V«(s0),  Vs0G5. 

By  the  monotonicity  of  the  operator  2? ,  it  then  follows  that 

r*0+\R,  W')  <  2?(Voo){sq),  Ws0  ES,Ne  N. 

Taking  the  limit  on  the  left  hand  side  of  this  expression,  we  arrive  at  the  inequality 

Ko(so)  <  2T(Vaa)(s0),  Vs0eS. 


To  show  that  the  reverse  inequality  also  holds,  we  define  for  notational  convenience  the  func¬ 
tions  Vk  :  S  — >  [0, 1]  as  14  :=  2Pk(\ r),  k>  0.  Clearly,  for  every  sq  E  S, 


Kc(so)  >  ^+1(1^)(50)  =  ^(VN)(s0) 

>  inf  l/f(^o)  +  lW'\R(so)H(so,a,b,VN),Va  E  Ca. 


(B.l) 


By  Proposition  4.1,  there  exists  a  Borel- measurable  function  g#  :  S  x  Ca  —>  C),  which  achieves  the 
infimum  in  equation  (B.l)  for  any  fixed  (sq,o)  E  S  x  Ca.  This  then  implies  the  inequality 


V'oo(so)  >  1 R(s0)  +  lW'\R(so)H(so,a,g*N(s0,a),VN ) 


(B.2) 


for  every  sq  E  S,  a  E  Ca,  and  N  >  1. 

Given  that  the  player  II  action  space  Q,  is  compact,  the  sequence  {g^(5o,«)}^=1  has  a  sub¬ 
sequence  {^a)}  |  which  converges  to  some  point  b*SQ ^  E  Q,  (see  for  example  Rudin, 


1976,  Theorem  3.6).  For  any  fixed  (so, a)  E  S  x  Ca,  we  relabel  the  sequence  <{  (so, 

}oo 

. 

k=  t 


k=  1 


as 


186 


Now  define  a  function  Fk  :  S  x  Ca  x  Q,  — >■  [0, 1]  by 

Fk(s0,a,b )  =  lR(so)  +  lW'\R(so)H(so,a,b,VNk). 

Some  useful  properties  of  the  operator  H  are  given  below: 

•  For  any  Borel-measurable  functions  J.  f  G  FP  such  that  7  <  /,  H(s,a,b,J)  <  H(s,a,b,J '), 
Vso  G  5,  a  G  Ca,  b  G  Q,. 

•  For  any  sequence  of  Borel-measurable  functions  7^  G  such  that  7o  <  7/(  <  7/(+i  for  all  k 
and  lim^oo 7^  =  7,  lim^oo FI (s , a, Z?,7^)  =  H(s,a,b,J),  Vsq  £  S,  a  e  Ca,  b  G  Q,. 

The  first  property  can  be  directly  inferred  from  the  definition  of  //,  while  the  second  property 
follows  by  an  application  of  the  Monotone  Convergence  Theorem  (see  for  example  Folland,  1999, 
Theorem  2.14). 

Using  these  properties  and  the  fact  that  Vjv*  is  a  sequence  of  monotonically  increasing  functions 
converging  to  Ko,  we  have  for  every  sq  G  S,  a  G  Ca,  b  G  C/,  that 

Fk(so,a,b )  <  k  >  1  (B.3) 

lim  Fk(so,a,b)  =  lR(so)  +  lWi\R(so)H(so,a,b,V00).  (B.4) 

k— >°° 

Consider  a  function  f  :  5  x  Cfl  x  Cj,  -)•  [0, 1]  defined  as 

F{s0,a,b)  :=  l/j(s0)  +  Iw'YrOo)#  Ko). 

Using  (B.3)  and  (B.4),  we  will  proceed  to  show  that  the  following  inequality  holds: 

sup Fk(sG,a,bkM)  >  F(s0,a,b*{so  a)),  Vs0  e  5, a  G  Ca.  (B.5) 

fceN 

This  combined  with  (B.2)  would  then  imply 

V-{so)  >  sup Fk(s0,a,bk{  a))  >  F(s0,a,b*(  a)), 

ke  N 

for  every  ,vo  G  S  and  a  G  Ca,  and  hence 

Ko(50)  >  sup  inf  F(so,a,b)  =  )(so)>  Vso  e  5. 

aeCfl  &eC7 


In  order  to  show  (B.5),  we  first  observe  that  by  (B.4), 

l™Fk(so,a,bls o,a))  =  FOo ,a,b^),  Vs0  ES,ae  Ca. 

Now  fix  so  £  S’  and  a  G  Ca.  For  any  e  >  0,  it  then  follows  that  there  exists  some  N  G  N  such  that 

?Nt 


F"(s0,a,bl,oa))>F(s0,a,bl,oa)) 


£. 
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By  the  assumptions  placed  on  the  DTSHG,  FN(so,a,-)  is  a  continuous  function  on  Q>,  which 
implies 

This  in  turn  implies  that  there  exists  K  e  N  such  that  for  every  k>  K, 

FN(so,a,bk{so  a) )  >  FN {s(),a,b*SQ  a))  —  £. 

Now  we  consider  two  cases.  First,  suppose  N  >K.  Then 

Fyv(50,a,^o  a))  >  F^(so,fl,^0ia))  -e 
>F(s0,a,b*M)-2e 

Second,  suppose  N  <  K.  Then  by  (B.3), 

FK(s0la,bfSQ  a))  >  FN(SQ,a,bfSo  a) ) 

>FN(SQ,ayM)-£ 

>F(so,a,b*M)~  2e 

Since  e  is  arbitrary,  (B.5)  then  follows.  This  completes  the  proof.  □ 
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Appendix  C 

Proof  of  Lemma  5.1 


Proof.  First,  we  note  that  given  p  e  &{Q)  and  a  e  Ca.  the  prediction  equation  (5.13)  in  section 
5.3.2  becomes 


vi ?(P,o)(tf)  =  £ e  0. 

<?e<2 


Furthermore,  the  stochastic  kernels  <t>o  and  <t>  satisfying  equations  (5.11)  and  (5.12)  in  section  5.3.2 
can  be  defined  as 


<f>o(<?|p;o)  =  <&{<q\p\o,o) 


po{o\q)p{q)  ~ 

'Lq'eQpo{°W)P^y  q 


ttLq>eQpo(0W)P(q')  y  0  and 


®o(q\P‘,o)  =  <&(q\P\o,  (?)  =  p(q),  qe£f 


where  p  e  ^(Q)  is  arbitrary,  if  Po(o\q')p{q')  —  0.  Now  fix  po  e  ^(Q)  and  ft'  e  ft'.  Define 
pkif  (po)',ik)  recursively  through  the  filtering  equation  (5.14)  as 

Po(£(po)Uo)(qo)  =  d>o(<7o|<^bo);oo),  qo  e  0, 

Pk+i(Z(po)',ik+i){qk+i)  =  ®(qk+i\W(Pk(Z(poy,ik),Oky,Ok+i,Ok),  Vk+i  e  Q, 


Then  by  the  definitions  of  VF,  T>q  and  <t>,  it  is  sufficient  to  show  that  the  following  events 


£o  =  <  (90,00)  e  &0  :  Y,  Po{oo\qo)Z{po){q'o)  =  0  , 

Ek  =  {{qo,oo,<Jo,...,qk-i,ok-i,<yk-i,qki°k)  £  Elk  ’■ 

E  L  Po(0kWk)Pq{qkWk-i,c>k^i)Pk-i{^(po)Pk-i)(qk-i)  =°  y  k>i 

q'k£Qq!k-x£Q  ) 
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have  Pk(ft' ,  c,  (po))  measure  zero  for  every  k. 

Indeed,  for  k  —  0  and  any  set  Q  x  {00}  C  Eq,  we  have 

,£,(pq))(Qx{oq})=  Y  Po{oo\qo)Z(po)(qo)  =  0, 

90  £(2 

from  which  it  follows  that  Po(ft',  t,  (pq))(Eq)  =  0. 

For  A'  >  1,  we  note  that  by  Lemma  10.4  of  Bertsekas  and  Shreve  (1978),  there  exists  a  set  4_i  C 
4-i  with  Pk-i(7t'^(po))  measure  one  on  which  /4- 1  (Ft));  4-1 )  is  the  conditional  probability 

distribution  of  <4_i  given  (po)  and  4-t-  Thus,  for  any  set  Qk  1  1  x  {4}  C  Ek  such  that  4-t  ^  4-i, 
Pk(7t',  4  (po))(Qk  1  1  x  {4})  =  0.  On  the  other  hand,  for  any  set  <2^+1  x  {4}  C  Ek  such  that  4_i  G 

4-t. 

^k(^^(Po))(2i+1  x  {4})  =  £ 

4o,-,ft)eG*+1 

x  1  ( Ok- !  I E,  (po ) ;  4- 1 )  Pk_  1  ( n' ,  £  (/?0 ) )  (<?o ,  •  •  • ,  qk- 1 , 4- 1 ) 

(90v--,9t-i)e2i94iG2<7*:e2 

x  ^_,((Jn|^(po);  4- 1  )£*- 1  ( €  (po)  ;  4- 1 )  (4- 1 ) 
x  i  ( ft1 1 ,  (po  ) )  (<?o ,  •  •  • ,  qk- 1 )  4- 1 )  =  0. 

Hence,  P^Tt',  ^  (po))  (Ek)  =  0.  The  statement  of  the  lemma  then  follows.  □ 
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Appendix  D 

Proof  of  Lemma  5.2 


Proof.  Fix  any  policy  ft?  G  ft'.  For  the  case  of  k  —  0,  consider  a  stochastic  kernel  <J>o(- |£  (po);zo) 
defined  by 


d>o(50|^(p0);zo)  = 


Ib0Pz,o(zo\so)^  (po)(so)ds0 


Js Pzfi {zo | Sq)?,  (Po )  (A))d?o  ’ 

for  Bo  e  &{8),  if  Js  PzJ) {ZQ  | .?[) )^(po)  (.?[) ) df0  ±  0  and 

d>0(50|<§(po);zo)  =p(b0), 


for  Bo  e  &(S),  if  /sPz,o(zo|^)£(po)Cso)^So  =  0,  where  p  e  ??(X)  is  arbitrary.  By  Proposition 
7.29  of  Bertsekas  and  Shreve  (1978),  the  function  Aq  :  Z  — >•  M  defined  as 

^o(zo)  =  [.  Pz.o  (zq  |  s{) )  %  {po )  (f0 )  ds0 
Js 

is  Borel-measurable.  Hence,  the  set 


h>  ■=  jzo  e  Z :  J_pz,o{zo\&0)£(po)(&0)d&0  =  0^ 

is  also  Borel-measurable.  It  then  follows  that  Oo(Jso|^(,Po);zo)  as  defined  above  is  a  Borel- 
measurable  stochastic  kernel.  Furthermore,  it  can  be  checked  in  a  straightforward  manner  that 
<F0  satisfies  equation  (5.11)  in  section  5.3.2.  By  the  filtering  procedure  in  equation  (5.14),  we  have 


Po(^(po)',zo)  =<J>oMl€(po);zo)- 

Let  Eq  c  £2o  be  the  set  of  events  such  that  o(zo|so)<§  (Po)(^o)d^o  =  0.  Clearly,  for  (so,zo)  ^  Eq, 
Po ( c,  (po)',z.o)  has  the  density  Pq(-\%  (po)',zo)  as  given  in  the  statement  of  the  lemma.  It  remains  to 
be  shown  that  Eq  has  Po(ftf.  c,  (po))  measure  zero.  Indeed,  by  the  observation  that  Eq  =  S  x  Iq  and 
Fubini’s  theorem, 


P0(fi'  ^(po))(Eq)  =  /  pz,o(zo\so)Z(po)(s0)dzods0 

Js  JiQ 


UQJs 


Jz,o(zo\so)^(po)(so)ds0dzo  =  0. 
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For  the  inductive  step,  we  assume  that  for  some  k  >  1,  pk-i{^{po)'Jk-i)  has  the  probability 
density  pk_l(-\{£,(po)',ik-i)  for  Pk  _\(k' .qipo))  almost  every  4-t-  We  note  that  given  p  G  P?(S) 
and  a  G  Ca,  the  prediction  equation  (5.13)  in  section  5.3.2  becomes 

vF(p,a)(5/)  =  j  ps(s\s,a)ds^j  p(ds ),  S'  G  3§(S). 


By  an  application  of  Fubini’s  theorem,  y¥(p.a)  has  the  density  function 


s  -G 


Similarly  as  before,  a  stochastic  kernel  satisfying  equation  (5.12)  in  section  5.3.2  can  be  defined 
as 


<${B\p;z,a) 


lBPz(z\s,a)p{ds) 
IsPz{z\s'  :a)p(ds,y 


for  B  G  P§(S),  if  JsPz(z\st ,a)p(ds/)  ^  0  and 

®(B\p;z,a)  =  p(B), 

for  B  G  3&(S),  if  J§pz(z\S>  ,a)p(ds')  —  0,  where  p  G  &(§)  is  arbitrary.  Define  a  set 

4  e 4 :  /_  /.pz(zjt|4>fl*-i)/fj(4l'?ifc-i>flifc-i)p*-i(^(po);4-i) W-tM4  =  0 

JsJs 


For  4  ^  Q,  we  have  by  equation  (5.14)  in  section  5.3.2  that  for  every  B  G  P8(S), 


pk  ( (po) ;  4)  0 B)  =  3>  ( ■ B  I 'P  ( Pk- 1  ( 4  (po)  ;  4- 1 ) ,  ak~  t ) ;  z* ,  ak~  i ) 

_  Je  Pz  (zjt  1 4,  ak_  1  )pyt|it_  1  (4 1  %  (po) ;  4- 1 ,  ak-  i)dsk 
Is Pz (zk I sjt ,  «*- 1  )pjt|*- 1  (sjk I  k  (Po) ;  4- 1 ,  a*- 1 ) ds'k  ’ 

where 

p*|*- 1  (4 1  £>  (po) ;  4- 1 ,  a*- 1 )  =  p®  (4 1 4- 1 ,  a*- 1  )p*- 1  (£  (po) ;  4- 1 )  (dsk- 1 )  • 

By  the  induction  hypothesis,  there  exists  a  set  4-1  C  4-1  with  4  i(7f'.  ^  (po))  measure  zero,  away 
from  which  Pk-\(^(po)',h-\)  has  the  probability  density  pf  _ ,  (4 (Po)'Jk- 1  )•  It  then  follows  that 

for  4  G  (/f_,  x  Ca  x  Z)  fl Cf,  pk(£,(po)Pk)  has  the  probability  density  $t(-|(£(po);4)-  with  the 
observation  that 


Pk(n' ^(Po))(Sk  x  4— t  xCaxSxZ)=  4-t(^^(4o))(^  x  4-i)  -  0, 
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it  is  sufficient  to  show  that  the  set  Ck  has  Pk(ft g  (po))  measure  zero.  For  notational  conveniences, 
we  denote  by  Q(4-i)>  4  i  €  4-t  the  4- 1 -section  of  Ck,  namely 


Q(4-t)  •  {(^&—  1 7 Zk)  £  Ca  x  Z  .  (4_i,ajt_t,Zjfc)  6Q}- 

It  can  be  checked  that  Q(4-i)  is  a  Borel  subset  of  Cfl  x  Z  for  every  4-t  G  4-1-  By  Lemma 
10.4  of  Bertsekas  and  Shreve  (1978),  there  exists  a  subset  4-1  of  the  information  space  4_t  with 
Pk-i(7t' ,^(po))  measure  one  on  which  /4-i  (g(po)'dk-  I )  is  the  conditional  probability  distribution 
of  4-t  given  £,  (po)  and  4-t-  Using  this  fact  and  Fubini’s  theorem,  we  can  deduce  that 


Pk(ft',^(p0))(Sk+1  xCk)  =  I  /  pz(zk\sk)dzkPs(sk\h-uak-i)dsk 

Jsk+l  Jck 

x  7t'k-l  (dak_  \  (p0);4-i  )dPk-\ (ft',  £  (p0)) 


JskJik -i  JsJck{ik-i 


Pz  (zk  I  h)dzkPs  (4  |  4- 1 ,  ak- 


x  ^-i  (dcik-\  \Z,  (po); 4- 1 )dskdPk- 1  ( £  (po) ) 


'Q(4-i 


Pz  (z*  I  $k)dzkps($k  |4-1,  ak- 1 ) 


x  i  {dak_  1 1  £  (p0);  4_  i  )dskpk- 1  (£  (po) ;  4- 1 )  (<*4- 1 ) 

xdPk-i{7c',Z{po)) 


'Ck[ik~ 


Pz  {zk  1 4)  A  (4  l4-t,  a*- 1 ) 


X  A- 1  (£  (po) ;  4- 1 )  (44-  \)dskdzkK-  i(dak_i\£,  (p0) ;  4- 1 ) 
xdPk-iin'^tpo))  =0 


The  statement  of  the  lemma  then  follows. 


□ 


193 


